مقایسه روش‌‌های مختلف تجزیه‌وتحلیل خوشه‌‌ای (مطالعه موردی: جنگل‌‌های بلوط کرمانشاه)

نوع مقاله : علمی- پژوهشی

نویسندگان

1 دانشجوی دکتری، گروه جنگل‌‌‌داری، دانشکده منابع طبیعی، دانشگاه ارومیه، ارومیه، ایران

2 دانشیار، گروه جنگل‌‌‌‌داری، دانشکده منابع طبیعی، دانشگاه ارومیه، ارومیه، ایران

3 استادیار، گروه ریاضی، دانشکده علوم، دانشگاه ارومیه، ارومیه، ایران

4 دانشیار، گروه جنگل‌‌داری، دانشکده منابع طبیعی، دانشگاه ارومیه، ارومیه، ایران

5 استاد، گروه اکولوژی، دانشگاه ایالتی مونتانا، بوزمن، آمریکا

چکیده

طبقه‌‌بندی، ابزاری کارآمد برای پژوهش‌‌های جوامع گیاهی و بررسی پدید‌‌ه‌‌های بوم‌‌شناختی است. هدف از پژوهش پیش‌‌رو، مقایسه روش‌های مختلف خوشه‌‌بندی در تجزیه‌وتحلیل خوشه‌‌ای بود. سه قطعه جنگلی با جهت جنوبی از توده‌های بلوط در جنگل‌‌های چهارزبر استان کرمانشاه با شرایط مشابه ازنظر شیب و ارتفاع از سطح دریا انتخاب شدند. در هر قطعه در فاصله‌های صفر، 25، 50، 100 و 150 متری با استفاده از سه خط‌نمونه که در فاصله‌های 200 متری از هم قرار گرفتند، نمونه‌برداری انجام شد. در این بررسی از روش تجزیه‌وتحلیل خوشه‌‌ای برای طبقه‌‌بندی پوشش‌‌ گیاهی استفاده شد. برای محاسبه ماتریس فاصله‌‌ها از روش Gower و برای اتصال خوشه‌‌ها از چهار روش نزدیک‌‌ترین همسایه، دورترین همسایه، اتصال میانگین و اتصال وارد استفاده شد. برای یافتن تعداد بهینه خوشه‌‌ها و بررسی کیفیت خوشه‌‌بندی در روش‌‌های مختلف از معیار سیلوئت استفاده شد. همچنین، انطباق بین ماتریس فاصله محاسبه‌‌شده و دندروگرام به‌دست آمده از روش‌‌های مختلف با ضریب همبستگی کوفنتیک ارزیابی شد. نتایج نشان داد که تعداد بهینه خوشه‌‌ها در جوامع بلوط منطقه مورد مطالعه، دو خوشه بود. مقدار همبستگی کوفنتیک بین ماتریس فاصله و دندروگرام به‌دست‌آمده از روش‌های میانگین و نزدیک‌‌ترین همسایه بیشتر از دو روش وارد و دورترین همسایه به‌دست آمد. همچنین، کیفیت خوشه‌‌بندی روش‌های نزدیک‌‌ترین همسایه و میانگین بهتر از دو روش دیگر بود، اما میانگین شاخص سیلوئت در خوشه دوم روش نزدیک‌‌ترین همسایه، بسیار کم بود، بنابراین روش اتصال خوشه میانگین همراه با ضریب فاصله Gower برای داده‌‌های ترتیبی مطلوب‌‌‌تر است و تغییری در داده‌‌ها ایجاد نمی‌کند.

کلیدواژه‌ها


عنوان مقاله [English]

Comparison of different methods for cluster analysis (Case study: Kermanshah oak forest, Iran)

نویسندگان [English]

  • Naghmeh Pakgohar 1
  • Javad Eshaghi Rad 2
  • Gholam Hossein Gholami 3
  • Ahmad Alijanpour 4
  • David W. Roberts 5
1 Ph.D. Student, Department of Forestry, Faculty of Natural Resources, Urmia University, Urmia, Iran
2 Associate Prof., Department of Forestry, Faculty of Natural Resources, Urmia University, Urmia, Iran
3 Assistant Prof., Department of Mathematics, Faculty of Science, Urmia University, Urmia, Iran
4 Associate Prof., Department of Forestry, Faculty of Natural Resources, Urmia University, Urmia, Iran
5 Prof., Department of Ecology, Montana State University, Bozeman, USA
چکیده [English]

Vegetation classification is an essential tool to describe, understand, predict and manage ecosystems. The aim of this study was to compare different types of hierarchical clustering. Three forest patches with similar slope and altitude gradients located on the southern slopes of Chahar Zebar forests, Kermanshah province, were selected. Vegetation sampling in each patch was conducted at 0, 25, 50, 100 and 150-meter distances along three transects that were 200 m apart. Cluster analysis was used for the classification of samples. Amongst the applied methods, Gower’s distance (or similarity) initially computes distances between pairs of variables over data sets and then merges those distances with the nearest neighbor, complete neighbor, average neighbor, and Ward’s method. The optimal number and quality of clusters were evaluated with silhouette criteria. In addition, the Cophenetic correlation coefficient was computed for evaluating the correlation between the dendrogram and the distance matrix. Results showed that two was the optimal number of clustering for oak stands. Moreover, the Cophenetic correlation coefficient between the distance matrix and the nearest neighbor and average method was higher than that returned between complete neighbor and Ward’s method. Based on silhouette criteria, the nearest neighbor and average methods were associated with higher cluster quality compared with two other methods. However, the mean value of the silhouette index was low for the second cluster of the nearest neighbor method. Considering the disadvantages of the nearest neighbor, the average method is suggested for clustering categorical data.

کلیدواژه‌ها [English]

  • Classification
  • clustering
  • Gower distance
  • ordinal number
- Alamgir, M., Turton, S.M., Macgregor, C.J. and Pert, P.L., 2016. Ecosystem services capacity across heterogeneous forest types: understanding the interactions and suggesting pathways for sustaining multiple ecosystem services. Science of the Total Environment, 566-567: 584-595
- Belbin, L. and McDonald, C., 1993. Comparing three classification strategies for use in ecology. Journal of Vegetation Science, 4(3): 341-348.
- Cao, Y., Bark, A.W. and Williams, W.P., 1997. A comparison of clustering methods for river benthic community analysis. Hydrobiologia, 347(1-3): 25-40.
- Damgaard, C., 2014. Estimating mean plant cover from different types of cover data: a coherent statistical framework. Ecospher, 5(2): 1-7.
- De Cáceres, M., Font, X. and Oliva, F., 2010. The management of vegetation classifications with fuzzy clustering. Journal of Vegetation Science, 21(6): 1138-1151.
- El-Serag, H.B., 2012. Epidemiology of viral hepatitis and hepatocellular carcinoma. Gastroenterology, 142(6):1264-1273.
- Eshaghi Rad, J., Soleimani, F. and Khodakarami, Y., 2017. Comparison of flora at the edge and within oak forests in southern slopes of Kermanshah forests. Applied Biology, 30(1): 19-35 (In Persian).
- Euskirchen, E.S., Chen, J. and Bi, R., 2001. Effects of edges on plant communities in a managed landscape in northern Wisconsin. Forest Ecology and Management, 148(1-3): 93-108.
- Everitt, B.S., Landau, S., Leese, M. and Stahl, D., 2011. Cluster Analysis, 5th Edition. John Wiley & Sons, Ltd., Chichester, UK, 346p.
- Gehlhausen, S.M., Schwartz, M.W. and Augspurger, C.K., 2000. Vegetation and microclimatic edge effects in two mixed-mesophytic forest fragments. Plant Ecology, 147(1): 21-35.
- Gill, D. and Tipper, J.C., 1978. The adequacy of non-metric data in geology: tests using a divisive omnithetic clustering technique. The Journal of Geology, 86(2): 241-259.
- Grabherr, G., Reiter, K. and Willner, W., 2003. Towards objectivity in vegetation classification: the example of the Austrian forests. Plant Ecology, 169(1): 21- 34
- Hall, M. and Richardson, T., 2016. Basic statistics for comparing categorical data from 2 or more groups. Hospital Pediatrics, 6(6): 383-385.
- Hämäläinen, J., Jauhiainen, S. and Kärkkäinen, T., 2017. Comparison of internal clustering validation indices for prototype-based clustering. Algorithms, 10(3): 105.
- Hüllbusch, E., Brandt, L.M., Ende, P. and Dengler, J., 2016. Little vegetation change during two decades in a dry grassland complex in the Biosphere Reserve Schorfheide-Chorin (NE Germany). Tuexenia, 36: 395-412.
- Ken, A., Roberts, D.W. and Weaver, T., 2008. Using geometric and non-geometric internal evaluators to compare eight vegetation classification methods. Journal of Vegetation Science, 19(4): 549-562
- Lechner, A.M., McCaffrey, N., McKenna, P., Venables, W.N. and Hunter, J.T., 2016. Ecoregionalization classification of wetlands based on a cluster analysis of environmental data. Applied Vegetation Science, 19(4): 724-735.
- Lengyel, A. and Botta-Dukát, Z., 2018. Silhouette width using generalized mean – a flexible method for assessing clustering efficiency. Biorixv, DOI: 10.1101/434100.
- Lengyel, A. and Podani, J., 2015. Assessing the relative importance of methodological decisions in classification of vegetation data. Journal of Vegetation Science, 26(4): 804-815.
- Lewis, K.P., 2004. How important is the statistical approach for analyzing categorical data? A critique using artificial nests. Oikos, 104(2): 305-315.
- Mahmoodi, M., Ramezani, E., Eshaghi-Rad, J. and Heidari Rikan, M., 2015. On the relationship between vegetation cover and physiographic factors in a gallery forest in southern Urmia, NW Iran. Iranian Journal of Forest and Poplar Research, 23(2): 279-293 (In Persian).
- McGranahan, D.A., Engle, D.M., Fuhlendorf, S.D., Miller, J.R. and Debinski, D.M., 2013. Multivariate analysis of rangeland vegetation and soil organic carbon describes degradation, informs restoration and conservation. Land, 2(3): 328-350.
- Mokaram Kashtiban, S., Mousavi Mirkala, S.R. and Eshaghi Rad, J., 2018. Effect of traditional utilization on woody species composition and diversity through Detrended Correspondence Analyses in Sardasht Forests (West Azerbaijan Province). Journal of Forest Research and Development, 4(3): 363-376 (In Persian).
- Pazouki, M., Sepehri, M.M. and Saberifiroozi, M., 2014. Discovering hidden cluster structures in patients with cirrhosis based on laboratory data. Govaresh, 19(3): 191-197 (In Persian).
- Peet, R.K. and Roberts, D.W., 2013. Classification of natural and semi-natural vegetation: 26-62. In: van der Maarel, E. and Franklin, J. (Eds.). Vegetation Ecology, 2nd Edition. Wiley-Blackwell, Chichester, UK, 572p.
- Podani, J., 1999. Extending Gower's general coefficient of similarity to ordinal characters. Taxon, 48(2): 331-340.
- Podani, J., 2005. Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and solutions. Journal of Vegetation Science, 16(5): 497-510.
- Shakeri, M.T., Sabaghian, E. and Esmaeili, H., 2011. CCK (Clustering-Classification-Kappa); a new validation index to assessing clustering results of gene expression data. Journal of North Khorasan University of Medical Sciences, 3: 67-78 (In Persian).
- Suh, J.P., Roh, J.H., Cho, Y.C., Han, S.S., Kim, Y.G. and Jena, K.K., 2009. The pi40 gene for durable resistance to rice blast and molecular analysis of pi40-advanced backcross breeding lines. Phytopathology, 99(3): 243-250.
- Vavrek, M.J., 2016. A comparison of clustering method for biogeography with fossil datasets. PeerJ, 4: e1720.
- Yang, S.Z., Feng, Y.Y. and Yeh, F.Y., 2007. Application of ordinal clustering to the taxonomy of the genus Entada (Fabaceae) in Taiwan. Bangladesh Journal of Plant Taxonomy, 14(2): 93-100
- Zuur, A.F., Leno, E.N. and Elphick, C.S., 2010. A protocol for data exploration to avoid common statistical problems. Methods in Ecology and Evolution, 1(1): 3-14.