Comparison of two non-hierarchal clustering performance in vegetation community datasets

Document Type : Scientific article

Authors

1 Ph.D. of Forestry, Department of Forestry, Faculty of Natural Resources, Urmia University, Urmia, Iran

2 Corresponding author, Prof., Department of Forestry, Faculty of Natural Resources, Urmia University, Urmia, Iran

3 Assistant Prof., Department of Mathematics, Faculty of Science, Urmia University, Urmia, Iran

4 Associate Prof., Department of Forestry, Faculty of Natural Resources, Urmia University, Urmia, Iran

5 Prof., Department of Ecology, Montana State University, Bozeman, USA

Abstract

     Clustering task is optimized and summarized high dimensional vegetation datasets that indicator of environmental change and gathering to interpreting pattern form ecosystem. Variety clustering methods is available and the issue is chosen proper methods. The aim of the research was compared two non-hierarchical clustering as K-means and K-medoids in forest ecosystems. For this purpose, two real datasets from Hyrcanian and Zagros forests of Iran and six simulated datasets were applied. The Hellinger transformation was employed before calculating dissimilarity matrices. Euclidean distance, Manhattan distance and Bray-Curtis dissimilarity indices were then calculated on the transformed data sets. And three evaluators including silhouette width, phi coefficient and ISAMIC were chosen. The results show that combination of Bray-Curtis dissimilarity matrices and K-means and K-medoids have first and second ranks among other clustering methods. K-means clustering is more effective in heterogenous dataset as Zagros and simulated datasets. The weakest clustering algorithm was combination between Manhattan distance and K-medoids. Also results show that Hellinger data transformation cause to improve Euclidean distance matrix. Our results indicated that combination of Bray-Curtis dissimilarity with K-means is more significant and recommended.

- Aho, K., Roberts, D.W. and Weaver, T., 2008. Using geometric and non-geometric internal evaluators to compare eight vegetation classification methods. Journal of Vegetation Science, 19(4): 549-562.
- Eshaghi Rad, J., Soleimani, F. and Khodakarami, Y., 2014. Influence of edge effect on plant composition and distribution in oak forests (case study: Cheharzebar forests-Kermanshah). Iranian Journal of Forest and Poplar Research, 22(3): 527-539 (In Persian).
- Eshaghi Rad, J., Zahedi Amiri, Gh., Marvi Mohajer, M.R. and Mataji, A., 2009. Relationship between vegetation and physical and chemical properties of soil in Fagetum communities (Case study: Kheiroudkenar forest). Iranian Journal of Forest and Poplar Research, 17(2): 174-187 (In Persian).
- Hämäläinen, J., Jauhiainen, S. and Kärkkäinen, T., 2017. Comparison of internal clustering validation indices for prototype-based clustering. Algorithms, 10(3): 105.
- Janatbabaei, M., Moradi, Gh. and Feghhi, J., 2020. Effect of soil and topography characteristics on distribution of plant types in the Arasbaran forests, Iran. Journal of Forest Research and Development, 5(4): 583-597 (In Persian).
- Kaufman, L. and Rousseeuw, P.J., 1990. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Inc., Hoboken, New Jersey, 342p.
- Khanalizadeh, A., Eshaghi Rad, J., Zahedi Amiri, Gh., Zare, H., Rammer, W. and Lexer, M.J., 2020. Assessing selected microhabitat types on living trees in Oriental beech (Fagus orientalis L.) dominated forests in Iran. Annals of Forest Science, 77(3): 91.
- Legendre, P. and De Cáceres, M., 2013. Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecology Letters, 16(8): 951-963.
- Legendre, P. and Gallagher, E.D., 2001. Ecologically meaningful transformations for ordination of species data. Oecologia, 129(2): 271-80.
- Legendre, P. and Legendre, L., 2012. Numerical Ecology, 3rd Edition. Elsevier, Amsterdam, 1006p.
- Lengyel, A. and Botta-Dukát, Z., 2019. Silhouette width using generalized mean—A flexible method for assessing clustering efficiency. Ecology and Evolution, 9(23): 13231-13243.
- Lengyel, A., Landucci, F., Mucina, L., Tsakalos, J.L. and Botta-Dukát, Z., 2018. Joint optimization of cluster number and abundance transformation for obtaining effective vegetation classifications. Journal of Vegetation Science, 29(2): 336-347.
- Liu, D. and Graham, J., 2019. Simple measures of individual cluster-membership certainty for hard partitional clustering. The American Statistician, 73(1): 70-79.
- Lötter, M.C., Mucina, L. and Witkowski, E.T.F., 2013. The classification conundrum: species fidelity as leading criterion in search of a rigorous method to classify a complex forest data set. Community Ecology, 14(1): 121-132
- MacQueen, J.B., 1967. Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Volume 1: Statistics. Berkeley, California, 21 June-18 July 1965, 27 Dec. 1965 and 7 Jan. 1966: 281-297.
- Morris, T.P., White, I.R. and Crowther, M.J., 2019. Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11): 2074-2102.
- Pakgohar, N., Eshaghi Rad, J., Gholami, G.H., Alijanpour, A. and Roberts, D.W., 2021. A comparative study of hard clustering algorithms for vegetation data. Journal of Vegetation Science, 32(3): e13042.
- Peet, R.K. and Roberts, D.W., 2013. Classification of natural and semi-natural vegetation: 28-70. In: van der Maarel, E. and Franklin, J. (Eds.). Vegetation Ecology, Second Edition. Wiley-Blackwell, Oxford, 584p.
- Peterson, A.D., Ghosh, A.P. and Maitra, R., 2010. A systematic evaluation of different methods for initializing the k-means clustering algorithm. Technical Report 07, Department of Statistics, Iowa State University, Ames, Iowa, 105p.
- Qiu, W. and Joe, H., 2015. The clusterGeneration package. Available at: https://cran.r-project.org/‌web/packages/‌clusterGeneration/‌index.html
- Roberts, D.W., 2015. Vegetation classification by two new iterative reallocation optimization algorithms. Plant Ecology, 216(5): 741-758.
- Roberts, D.W., 2016. Package ‘coenoflex: Gradient-Based Coenospace Vegetation Simulator, Version 2.2-0. Available at: https://‌cran.r-project.org/‌web/packages/‌coenoflex/‌index.html
- Roberts, D.W., 2017. Distance, dissimilarity, and mean–variance ratios in ordination. Methods in Ecology and Evolution, 8(11): 1398-1407.
- Rodriguez, M.Z., Comin, C.H., Casanova, D., Bruno, O.M., Amancio, D.R., Costa, L.D.F. and Rodrigues, F.A., 2019. Clustering algorithms: A comparative approach. PLoS One, 14(1): e0210236.
- Rousseeuw, P.J., 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20: 53-65.
- Schmidtlein, S., Tichý, L., Feilhauer, H. and Faude, U., 2010. A brute-force approach to vegetation classification. Journal of Vegetation Science, 21(6): 1162-1171.
- Tichý, L. and Chytrý, M., 2006. Statistical determination of diagnostic species for site groups of unequal size. Journal of Vegetation Science, 17(6): 809-818.
- Tichý, L., Chytrý, M. and Botta-Dukát, Z., 2014. Semi-supervised classification of vegetation: preserving the good old units and searching for new ones. Journal of Vegetation Science, 25(6): 1504-1512.