ClusteringBackground
Cluster analysis or clustering is the assignment of a set of
observations into subsets (called clusters) so that observations in the same
cluster are similar in some sense. Clustering is a method of unsupervised
learning, and a common technique for statistical data analysis used in many
fields, including bioinformatics.
Algorithms
Hierarchical
clustering creates a hierarchy of clusters which may be represented in a tree
structure called a dendrogram. The root of the tree consists of a single
cluster containing all observations, and the leaves correspond to individual
observations.
The k-means
algorithm assigns each point to the cluster whose center (also called
centroid) is nearest. The center is the average of all the points in the cluster
— that is, its coordinates are the arithmetic mean for each dimension
separately over all the points in the cluster.
A self-organizing map (SOM) or self-organizing
feature map (SOFM) is a type of artificial neural network that is
trained using unsupervised learning to produce a low-dimensional (typically
two-dimensional), discretized representation of the input space of the training
samples, called a map.
Self-organizing maps are different from other artificial neural networks in the
sense that they use a neighborhood function to preserve the topological
properties of the input space.
Analysis
Simbiot microarray analysis integrates a total of 6 implementation of 3 clustering
algorithms. Each algorithm (k-means, SOM and Hierarchical) is available via Cluster 3.0 (de Hoon, Imoto et al.
2004)
function and via R built-in functions.
For more information about the individual algorithms, please follow the
links below:
Cluster 3.0: (k-means, SOM, Hierarchical)
R: k-means
R: SOM
R: Hierarchical
Free demo accounts are available at http://www.simbiot.net.
Please also see more information about Simbiot Single User
Accounts and Private Server installations as well as a brief introduction to microarray analysis.
References
de Hoon, M. J.,
S. Imoto, et al. (2004). "Open source clustering software." Bioinformatics
20(9): 1453-4.
|