Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions
Background
It has recently been shown that the detection of gene fusion events across genomes can be used for predicting functional associations of proteins, including physical interaction or complex formation. To obtain such predictions we have made an exhaustive search for gene fusion events within 24 available completely sequenced genomes.
Results
Each genome was used as a query against the remaining 23 complete genomes to detect gene fusion events. Using an improved, fully automatic protocol, a total of 7,224 single-domain proteins that are components of gene fusions in other genomes were detected, many of which were identified for the first time. The total number of predicted pairwise functional associations is 39,730 for all genomes. Component pairs were identified by virtue of their similarity to 2,365 multidomain composite proteins. We also show for the first time that gene fusion is a complex evolutionary process with a number of contributory factors, including paralogy, genome size and phylogenetic distance. On average, 9% of genes in a given genome appear to code for single-domain, component proteins predicted to be functionally associated. These proteins are detected by an additional 4% of genes that code for fused, composite proteins.
Conclusions
These results provide an exhaustive set of functionally associated genes and also delineate the power of fusion analysis for the prediction of protein interactions.
Complete Metadata
| @type | dcat:Dataset |
|---|---|
| accessLevel | public |
| bureauCode |
[
"009:25"
]
|
| contactPoint |
{
"fn": "NIH",
"@type": "vcard:Contact",
"hasEmail": "mailto:info@nih.gov"
}
|
| description | Background It has recently been shown that the detection of gene fusion events across genomes can be used for predicting functional associations of proteins, including physical interaction or complex formation. To obtain such predictions we have made an exhaustive search for gene fusion events within 24 available completely sequenced genomes. Results Each genome was used as a query against the remaining 23 complete genomes to detect gene fusion events. Using an improved, fully automatic protocol, a total of 7,224 single-domain proteins that are components of gene fusions in other genomes were detected, many of which were identified for the first time. The total number of predicted pairwise functional associations is 39,730 for all genomes. Component pairs were identified by virtue of their similarity to 2,365 multidomain composite proteins. We also show for the first time that gene fusion is a complex evolutionary process with a number of contributory factors, including paralogy, genome size and phylogenetic distance. On average, 9% of genes in a given genome appear to code for single-domain, component proteins predicted to be functionally associated. These proteins are detected by an additional 4% of genes that code for fused, composite proteins. Conclusions These results provide an exhaustive set of functionally associated genes and also delineate the power of fusion analysis for the prediction of protein interactions. |
| distribution |
[
{
"@type": "dcat:Distribution",
"title": "Official Government Data Source",
"mediaType": "text/html",
"description": "Visit the original government dataset for complete information, documentation, and data access.",
"downloadURL": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC65099/"
}
]
|
| identifier | https://healthdata.gov/api/views/sz65-dtgd |
| issued | 2025-07-14 |
| keyword |
[
"gene-fusion",
"genomic-analysis",
"nih",
"protein-associations",
"protein-interaction"
]
|
| landingPage | https://healthdata.gov/d/sz65-dtgd |
| modified | 2025-09-06 |
| programCode |
[
"009:033"
]
|
| publisher |
{
"name": "National Institutes of Health",
"@type": "org:Organization"
}
|
| theme |
[
"NIH"
]
|
| title | Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions |