A Hybrid Clustering-Classification Approach for Predicting Strength and Analyzing Material Composition of Geopolymers

YILMAZ, YILDIRAN; ÇAKMAK, TALİP; USTABAŞ, İLKER

doi:10.3390/polym18080959

A Hybrid Clustering-Classification Approach for Predicting Strength and Analyzing Material Composition of Geopolymers

YILMAZ Y., ÇAKMAK T., USTABAŞ İ.

POLYMERS, cilt.18, sa.8, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 18 Sayı: 8
Basım Tarihi: 2026
Doi Numarası: 10.3390/polym18080959
Dergi Adı: POLYMERS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Chemical Abstracts Core, Compendex, INSPEC
Recep Tayyip Erdoğan Üniversitesi Adresli: Evet

Özet

The development of geopolymers as sustainable alternative binders has been accelerated by the environmental requirement to reduce the carbon footprint of cement. However, predicting their key properties, such as compressive strength, from their complex chemical composition remains a significant challenge. Although mixture ratios prepared on a macro-scale are widely used for quality control purposes, they do not account for the chemical structure, despite this having a direct impact on the materials' structural properties. Predicting fundamental properties such as compressive strength from complex chemical compositions remains a significant challenge due to the nonlinear relationships between the elemental components. This research paper introduces a tailored hybrid machine learning framework that combines K-means clustering with classification algorithms. The method uses energy-dispersive X-ray spectroscopy (EDS) data to classify geopolymer samples into their specific mixture numbers, which allows scientists to predict material properties through compositional analysis. A new dataset featuring the elemental compositions of Si, Al, Na, Ca, O, and C, as well as the critical ratios of Si/Al and Ca/Si, was analyzed. The initial step involved clustering the data to discover natural compositional clusters, which served as the basis for training and testing five different classifiers, which included Random Forest (RF), Artificial Neural Networks (ANN), LightGBM, Naive Bayes (NB), and Linear Discriminant Analysis (LDA). The consequences proved that the hybrid method worked with outstanding efficiency. RF achieved the highest performance results through its 98% accuracy, 96% recall, 94% precision, and 95% F1-score results when it classified samples according to their clustered groups. SHAP (SHapley Additive exPlanations) and permutation feature importance analyses both showed that Si/Al proportion functioned as the most crucial predictive variable, while oxygen (O) content and silicon (Si) content followed in importance. The K-means cluster labels produced high accuracy results because they demonstrated that compositional data had strong natural groups, which matched the target property. The system delivers an efficient method which enables fast and dependable geopolymer property forecasts through direct analysis of chemical composition with chemical composition analysis, thus delivering essential information to enhance mix design processes and boost sustainable building material production.