The Adaptive Evolution Database (TAED)
Background
The Master Catalog is a collection of evolutionary families, including multiple sequence alignments, phylogenetic trees and reconstructed ancestral sequences, for all protein-sequence modules encoded by genes in GenBank. It can therefore support large-scale genomic surveys, of which we present here The Adaptive Evolution Database (TAED). In TAED, potential examples of positive adaptation are identified by high values for the normalized ratio of nonsynonymous to synonymous nucleotide substitution rates (KA/KS values) on branches of an evolutionary tree between nodes representing reconstructed ancestral sequences.
Results
Evolutionary trees and reconstructed ancestral sequences were extracted from the Master Catalog for every subtree containing proteins from the Chordata only or the Embryophyta only. Branches with high KA/KS values were identified. These represent candidate episodes in the history of the protein family when the protein may have undergone positive selection, where the mutant form conferred more fitness than the ancestral form. Such episodes are frequently associated with change in function. An unexpectedly large number of families (between 10% and 20% of those families examined) were found to have at least one branch with high KA/KS values above arbitrarily chosen cut-offs (1 and 0.6). Most of these survived a robustness test and were collected into TAED.
Conclusions
TAED is a raw resource for bioinformaticists interested in data mining and for experimental evolutionists seeking candidate examples of adaptive evolution for further experimental study. It can be expanded to include other evolutionary information (for example changes in gene regulation or splicing) placed in a phylogenetic perspective.
Complete Metadata
| @type | dcat:Dataset |
|---|---|
| accessLevel | public |
| bureauCode |
[
"009:25"
]
|
| contactPoint |
{
"fn": "NIH",
"@type": "vcard:Contact",
"hasEmail": "mailto:info@nih.gov"
}
|
| description | Background The Master Catalog is a collection of evolutionary families, including multiple sequence alignments, phylogenetic trees and reconstructed ancestral sequences, for all protein-sequence modules encoded by genes in GenBank. It can therefore support large-scale genomic surveys, of which we present here The Adaptive Evolution Database (TAED). In TAED, potential examples of positive adaptation are identified by high values for the normalized ratio of nonsynonymous to synonymous nucleotide substitution rates (KA/KS values) on branches of an evolutionary tree between nodes representing reconstructed ancestral sequences. Results Evolutionary trees and reconstructed ancestral sequences were extracted from the Master Catalog for every subtree containing proteins from the Chordata only or the Embryophyta only. Branches with high KA/KS values were identified. These represent candidate episodes in the history of the protein family when the protein may have undergone positive selection, where the mutant form conferred more fitness than the ancestral form. Such episodes are frequently associated with change in function. An unexpectedly large number of families (between 10% and 20% of those families examined) were found to have at least one branch with high KA/KS values above arbitrarily chosen cut-offs (1 and 0.6). Most of these survived a robustness test and were collected into TAED. Conclusions TAED is a raw resource for bioinformaticists interested in data mining and for experimental evolutionists seeking candidate examples of adaptive evolution for further experimental study. It can be expanded to include other evolutionary information (for example changes in gene regulation or splicing) placed in a phylogenetic perspective. |
| distribution |
[
{
"@type": "dcat:Distribution",
"title": "Official Government Data Source",
"mediaType": "text/html",
"description": "Visit the original government dataset for complete information, documentation, and data access.",
"downloadURL": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC55325/"
}
]
|
| identifier | https://healthdata.gov/api/views/7ngd-g27v |
| issued | 2025-07-14 |
| keyword |
[
"adaptive-evolution",
"nih",
"phylogenetic-trees",
"protein-sequences",
"sequence-alignment"
]
|
| landingPage | https://healthdata.gov/d/7ngd-g27v |
| modified | 2025-09-06 |
| programCode |
[
"009:033"
]
|
| publisher |
{
"name": "National Institutes of Health",
"@type": "org:Organization"
}
|
| theme |
[
"NIH"
]
|
| title | The Adaptive Evolution Database (TAED) |