Model-based cluster analysis of microarray gene-expression data
Background
Microarray technologies are emerging as a promising tool for genomic studies. The challenge now is how to analyze the resulting large amounts of data. Clustering techniques have been widely applied in analyzing microarray gene-expression data. However, normal mixture model-based cluster analysis has not been widely used for such data, although it has a solid probabilistic foundation. Here, we introduce and illustrate its use in detecting differentially expressed genes. In particular, we do not cluster gene-expression patterns but a summary statistic, the t-statistic.
Results
The method is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle-ear infection. Three clusters were found, two of which contain more than 95% genes with almost no altered gene-expression levels, whereas the third one has 30 genes with more or less differential gene-expression levels.
Conclusions
Our results indicate that model-based clustering of t-statistics (and possibly other summary statistics) can be a useful statistical tool to exploit differential gene expression for microarray data.
Complete Metadata
| @type | dcat:Dataset |
|---|---|
| accessLevel | public |
| bureauCode |
[
"009:25"
]
|
| contactPoint |
{
"fn": "NIH",
"@type": "vcard:Contact",
"hasEmail": "mailto:info@nih.gov"
}
|
| description | Background Microarray technologies are emerging as a promising tool for genomic studies. The challenge now is how to analyze the resulting large amounts of data. Clustering techniques have been widely applied in analyzing microarray gene-expression data. However, normal mixture model-based cluster analysis has not been widely used for such data, although it has a solid probabilistic foundation. Here, we introduce and illustrate its use in detecting differentially expressed genes. In particular, we do not cluster gene-expression patterns but a summary statistic, the t-statistic. Results The method is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle-ear infection. Three clusters were found, two of which contain more than 95% genes with almost no altered gene-expression levels, whereas the third one has 30 genes with more or less differential gene-expression levels. Conclusions Our results indicate that model-based clustering of t-statistics (and possibly other summary statistics) can be a useful statistical tool to exploit differential gene expression for microarray data. |
| distribution |
[
{
"@type": "dcat:Distribution",
"title": "Official Government Data Source",
"mediaType": "text/html",
"description": "Visit the original government dataset for complete information, documentation, and data access.",
"downloadURL": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC65687/"
}
]
|
| identifier | https://healthdata.gov/api/views/yh42-xkaf |
| issued | 2025-07-14 |
| keyword |
[
"cluster-analysis",
"gene-expression",
"microarray-data",
"nih",
"pneumococcal-infection"
]
|
| landingPage | https://healthdata.gov/d/yh42-xkaf |
| modified | 2025-09-06 |
| programCode |
[
"009:033"
]
|
| publisher |
{
"name": "National Institutes of Health",
"@type": "org:Organization"
}
|
| theme |
[
"NIH"
]
|
| title | Model-based cluster analysis of microarray gene-expression data |