Cluster Analysis: Using All The Information
Francis L. Battye
The Walter & Eliza Hall Institute
The foundation of any flow cytometry data analysis exercise is either the discovery of all populations of cells deemed similar with respect to the measurements made on the sample or, if the various component cell populations are known, then their delineation prior to enumeration or further analysis. This discovery or delineation has been traditionally the result of a scientist/data interaction involving the step-wise assignment of limits ("regions" or "windows") applied to single or pairs of measured parameters. With multi-parameter data, this technique is at best, tedious and at worst, erroneous.
An alternative to this interactive process is to allow a strictly mechanical "cluster analysis" which considers all measured parameters simultaneously. The computer program described in this work, developed to perform the task of classification, has been given no knowledge of flow cytometry, but seeks to classify cells simply according to their relative proximities as points in n-dimensional space corresponding to the n measured cytometry parameters.
Two different approaches have been tested for assignment of cells to "cohorts" or "clusters". In the first "mutual near neighbour" (mnn) approach, two cells (A and B) are assigned to the same cluster if they are "mth nearest neighbours" i.e., there are no more than m cells closer to A than B and there are no more than m cells closer to B than A. The "mutual nearness" requirement is aimed at distinguishing sparse clusters which lie near denser clusters. Membership of a cluster is extended to cells which are mnn's of mnn's of mnn's, etc. Clustering is assigned in a single pass through the data and no particular ordering is required.
The second approach is more in line with the way a human operator identifies cell populations in 2 dimensions; by the point density. The cells are first ordered according to the density of other cells surrounding them in n-space and each cell is then assigned to the densest cluster to which any of its neighbours belongs, or to a new cluster if necessary.
Each approach has been tested in the real world of flow cytometry on 6, 7 and 8-parameter data of different types. The early findings indicate that the mnn approach, while effective at identifying clusters which are reasonably well separated in n-space (e.g. peripheral blood cell subsets), copes poorly where there is no positive/negative dichotomy or particularly, in samples of immature cells where there is a differentiation pathway between phenotypes. In this latter case, the ordered density approach has been shown to detect more subtle differences between groups of cells.



