Identification of clusters in tissue samples in gene expression data with Principal Component Analysis based on relative variance matrix

Uzma Nawaz; Asghar Ali

doi:10.5897/AJMR10.645

African Journal of
Microbiology Research

Abbreviation: Afr. J. Microbiol. Res.
Language: English
ISSN: 1996-0808
DOI: 10.5897/AJMR
Start Year: 2007
Published Articles: 5227

Full Length Research Paper

Identification of clusters in tissue samples in gene expression data with Principal Component Analysis based on relative variance matrix

Uzma Nawaz* and Asghar Ali

Department of Statistics, Bahauddin Zakariya University, 60800, Multan, Pakistan.
Email: [email protected]

Article Number - A635C1612845
Vol.5(1), pp. 34-43 , January 2011
https://doi.org/10.5897/AJMR10.645

Accepted: 09 December 2010
Published: 04 January 2011

Copyright © 2024 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0.

Abstract

Principal Component Analysis (PCA) has been in use as a preprocessing step to clustering for long. We have focused on the clustering of tissue samples in gene expression data. Different clustering techniques and algorithm are available in literature on gene expression data but with the existing ambiguity on the number of clusters, apart from relying on biologically known groups. A consensus is needed to reach on the number of clusters in the wide variety of existing clustering techniques based on different similarity or dissimilarity metrics. The conventional usage of PCA for clustering is either by forcing the unit variance to each variable or the high magnitude of variance of an individual variable is allowed to dominate the entire results of PCA. We propose the use of relative variance covariance method in PCA, so as to give due consideration to the joint and individual variances in the dataset and identify clusters with principal component loadings. We emphasize empirically that the proposed approach of PCA is conclusively more informative than the available approaches to identify cluster structure in tissue samples (sample expression profiles). Clusters formed are valid with the existing results on the data set under study and with valid biological background.

Key words: Clustering methods, gene expression analysis, principal component analysis, the relative variance covariance matrix, principal component loadings.

This article is published under the terms of the Creative Commons Attribution License 4.0

Back to Vol. 5 No. 1

Back to articles

Views: 0
Downloads: 0

Related Articles:
On Google
On Google Scholar

Articles on Google by:

African Journal of Microbiology Research

Identification of clusters in tissue samples in gene expression data with Principal Component Analysis based on relative variance matrix

Uzma Nawaz* and Asghar Ali

African Journal of
Microbiology Research