Comparison of different methods for cluster analysis (Case study: Kermanshah oak forest, Iran)

Document Type : Research article

Authors

1 Ph.D. Student, Department of Forestry, Faculty of Natural Resources, Urmia University, Urmia, Iran

2 Associate Prof., Department of Forestry, Faculty of Natural Resources, Urmia University, Urmia, Iran

3 Assistant Prof., Department of Mathematics, Faculty of Science, Urmia University, Urmia, Iran

4 Prof., Department of Ecology, Montana State University, Bozeman, USA

Abstract

Vegetation classification is an essential tool to describe, understand, predict and manage ecosystems. The aim of this study was to compare different types of hierarchical clustering. Three forest patches with similar slope and altitude gradients located on the southern slopes of Chahar Zebar forests, Kermanshah province, were selected. Vegetation sampling in each patch was conducted at 0, 25, 50, 100 and 150-meter distances along three transects that were 200 m apart. Cluster analysis was used for the classification of samples. Amongst the applied methods, Gower’s distance (or similarity) initially computes distances between pairs of variables over data sets and then merges those distances with the nearest neighbor, complete neighbor, average neighbor, and Ward’s method. The optimal number and quality of clusters were evaluated with silhouette criteria. In addition, the Cophenetic correlation coefficient was computed for evaluating the correlation between the dendrogram and the distance matrix. Results showed that two was the optimal number of clustering for oak stands. Moreover, the Cophenetic correlation coefficient between the distance matrix and the nearest neighbor and average method was higher than that returned between complete neighbor and Ward’s method. Based on silhouette criteria, the nearest neighbor and average methods were associated with higher cluster quality compared with two other methods. However, the mean value of the silhouette index was low for the second cluster of the nearest neighbor method. Considering the disadvantages of the nearest neighbor, the average method is suggested for clustering categorical data.

Keywords


- Alamgir, M., Turton, S.M., Macgregor, C.J. and Pert, P.L., 2016. Ecosystem services capacity across heterogeneous forest types: understanding the interactions and suggesting pathways for sustaining multiple ecosystem services. Science of the Total Environment, 566-567: 584-595
- Belbin, L. and McDonald, C., 1993. Comparing three classification strategies for use in ecology. Journal of Vegetation Science, 4(3): 341-348.
- Cao, Y., Bark, A.W. and Williams, W.P., 1997. A comparison of clustering methods for river benthic community analysis. Hydrobiologia, 347(1-3): 25-40.
- Damgaard, C., 2014. Estimating mean plant cover from different types of cover data: a coherent statistical framework. Ecospher, 5(2): 1-7.
- De Cáceres, M., Font, X. and Oliva, F., 2010. The management of vegetation classifications with fuzzy clustering. Journal of Vegetation Science, 21(6): 1138-1151.
- El-Serag, H.B., 2012. Epidemiology of viral hepatitis and hepatocellular carcinoma. Gastroenterology, 142(6):1264-1273.
- Eshaghi Rad, J., Soleimani, F. and Khodakarami, Y., 2017. Comparison of flora at the edge and within oak forests in southern slopes of Kermanshah forests. Applied Biology, 30(1): 19-35 (In Persian).
- Euskirchen, E.S., Chen, J. and Bi, R., 2001. Effects of edges on plant communities in a managed landscape in northern Wisconsin. Forest Ecology and Management, 148(1-3): 93-108.
- Everitt, B.S., Landau, S., Leese, M. and Stahl, D., 2011. Cluster Analysis, 5th Edition. John Wiley & Sons, Ltd., Chichester, UK, 346p.
- Gehlhausen, S.M., Schwartz, M.W. and Augspurger, C.K., 2000. Vegetation and microclimatic edge effects in two mixed-mesophytic forest fragments. Plant Ecology, 147(1): 21-35.
- Gill, D. and Tipper, J.C., 1978. The adequacy of non-metric data in geology: tests using a divisive omnithetic clustering technique. The Journal of Geology, 86(2): 241-259.
- Grabherr, G., Reiter, K. and Willner, W., 2003. Towards objectivity in vegetation classification: the example of the Austrian forests. Plant Ecology, 169(1): 21- 34
- Hall, M. and Richardson, T., 2016. Basic statistics for comparing categorical data from 2 or more groups. Hospital Pediatrics, 6(6): 383-385.
- Hämäläinen, J., Jauhiainen, S. and Kärkkäinen, T., 2017. Comparison of internal clustering validation indices for prototype-based clustering. Algorithms, 10(3): 105.
- Hüllbusch, E., Brandt, L.M., Ende, P. and Dengler, J., 2016. Little vegetation change during two decades in a dry grassland complex in the Biosphere Reserve Schorfheide-Chorin (NE Germany). Tuexenia, 36: 395-412.
- Ken, A., Roberts, D.W. and Weaver, T., 2008. Using geometric and non-geometric internal evaluators to compare eight vegetation classification methods. Journal of Vegetation Science, 19(4): 549-562
- Lechner, A.M., McCaffrey, N., McKenna, P., Venables, W.N. and Hunter, J.T., 2016. Ecoregionalization classification of wetlands based on a cluster analysis of environmental data. Applied Vegetation Science, 19(4): 724-735.
- Lengyel, A. and Botta-Dukát, Z., 2018. Silhouette width using generalized mean – a flexible method for assessing clustering efficiency. Biorixv, DOI: 10.1101/434100.
- Lengyel, A. and Podani, J., 2015. Assessing the relative importance of methodological decisions in classification of vegetation data. Journal of Vegetation Science, 26(4): 804-815.
- Lewis, K.P., 2004. How important is the statistical approach for analyzing categorical data? A critique using artificial nests. Oikos, 104(2): 305-315.
- Mahmoodi, M., Ramezani, E., Eshaghi-Rad, J. and Heidari Rikan, M., 2015. On the relationship between vegetation cover and physiographic factors in a gallery forest in southern Urmia, NW Iran. Iranian Journal of Forest and Poplar Research, 23(2): 279-293 (In Persian).
- McGranahan, D.A., Engle, D.M., Fuhlendorf, S.D., Miller, J.R. and Debinski, D.M., 2013. Multivariate analysis of rangeland vegetation and soil organic carbon describes degradation, informs restoration and conservation. Land, 2(3): 328-350.
- Mokaram Kashtiban, S., Mousavi Mirkala, S.R. and Eshaghi Rad, J., 2018. Effect of traditional utilization on woody species composition and diversity through Detrended Correspondence Analyses in Sardasht Forests (West Azerbaijan Province). Journal of Forest Research and Development, 4(3): 363-376 (In Persian).
- Pazouki, M., Sepehri, M.M. and Saberifiroozi, M., 2014. Discovering hidden cluster structures in patients with cirrhosis based on laboratory data. Govaresh, 19(3): 191-197 (In Persian).
- Peet, R.K. and Roberts, D.W., 2013. Classification of natural and semi-natural vegetation: 26-62. In: van der Maarel, E. and Franklin, J. (Eds.). Vegetation Ecology, 2nd Edition. Wiley-Blackwell, Chichester, UK, 572p.
- Podani, J., 1999. Extending Gower's general coefficient of similarity to ordinal characters. Taxon, 48(2): 331-340.
- Podani, J., 2005. Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and solutions. Journal of Vegetation Science, 16(5): 497-510.
- Shakeri, M.T., Sabaghian, E. and Esmaeili, H., 2011. CCK (Clustering-Classification-Kappa); a new validation index to assessing clustering results of gene expression data. Journal of North Khorasan University of Medical Sciences, 3: 67-78 (In Persian).
- Suh, J.P., Roh, J.H., Cho, Y.C., Han, S.S., Kim, Y.G. and Jena, K.K., 2009. The pi40 gene for durable resistance to rice blast and molecular analysis of pi40-advanced backcross breeding lines. Phytopathology, 99(3): 243-250.
- Vavrek, M.J., 2016. A comparison of clustering method for biogeography with fossil datasets. PeerJ, 4: e1720.
- Yang, S.Z., Feng, Y.Y. and Yeh, F.Y., 2007. Application of ordinal clustering to the taxonomy of the genus Entada (Fabaceae) in Taiwan. Bangladesh Journal of Plant Taxonomy, 14(2): 93-100
- Zuur, A.F., Leno, E.N. and Elphick, C.S., 2010. A protocol for data exploration to avoid common statistical problems. Methods in Ecology and Evolution, 1(1): 3-14.