Model-based cluster analysis of microarray gene-expression data

Published by National Institutes of Health | U.S. Department of Health & Human Services | Metadata Last Checked: September 07, 2025 | Last Modified: 2025-09-06

Background Microarray technologies are emerging as a promising tool for genomic studies. The challenge now is how to analyze the resulting large amounts of data. Clustering techniques have been widely applied in analyzing microarray gene-expression data. However, normal mixture model-based cluster analysis has not been widely used for such data, although it has a solid probabilistic foundation. Here, we introduce and illustrate its use in detecting differentially expressed genes. In particular, we do not cluster gene-expression patterns but a summary statistic, the t-statistic. Results The method is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle-ear infection. Three clusters were found, two of which contain more than 95% genes with almost no altered gene-expression levels, whereas the third one has 30 genes with more or less differential gene-expression levels. Conclusions Our results indicate that model-based clustering of t-statistics (and possibly other summary statistics) can be a useful statistical tool to exploit differential gene expression for microarray data.

Find Related Datasets

Click any tag below to search for similar datasets

Complete Metadata

@type	dcat:Dataset
accessLevel	public
bureauCode	[ "009:25" ]
contactPoint	{ "fn": "NIH", "@type": "vcard:Contact", "hasEmail": "mailto:info@nih.gov" }
description	Background Microarray technologies are emerging as a promising tool for genomic studies. The challenge now is how to analyze the resulting large amounts of data. Clustering techniques have been widely applied in analyzing microarray gene-expression data. However, normal mixture model-based cluster analysis has not been widely used for such data, although it has a solid probabilistic foundation. Here, we introduce and illustrate its use in detecting differentially expressed genes. In particular, we do not cluster gene-expression patterns but a summary statistic, the t-statistic. Results The method is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle-ear infection. Three clusters were found, two of which contain more than 95% genes with almost no altered gene-expression levels, whereas the third one has 30 genes with more or less differential gene-expression levels. Conclusions Our results indicate that model-based clustering of t-statistics (and possibly other summary statistics) can be a useful statistical tool to exploit differential gene expression for microarray data.
distribution	[ { "@type": "dcat:Distribution", "title": "Official Government Data Source", "mediaType": "text/html", "description": "Visit the original government dataset for complete information, documentation, and data access.", "downloadURL": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC65687/" } ]
identifier	https://healthdata.gov/api/views/yh42-xkaf
issued	2025-07-14
keyword	[ "cluster-analysis", "gene-expression", "microarray-data", "nih", "pneumococcal-infection" ]
landingPage	https://healthdata.gov/d/yh42-xkaf
modified	2025-09-06
programCode	[ "009:033" ]
publisher	{ "name": "National Institutes of Health", "@type": "org:Organization" }
theme	[ "NIH" ]
title	Model-based cluster analysis of microarray gene-expression data

1 resource available