Main Article Content

Abstract

Missing data is a problem in data processing that can reduce the quality of analysis results if not addressed. This study aims to evaluate the performance of two imputation methods, namely Random Forest Imputation (RF) and Classification and Regression Tree (CART), at various levels of missing value proportions, namely 5%, 10%, 15%, and 20%. The data used in this study are Bivariate Gamma data of 200 observations with two variables, which were generated using RStudio software. The evaluation was carried out based on the correlation value between the imputed data and the original data, as well as the error measures Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE). The results showed that at the missing value levels of 5% and 10%, the CART method produced the smallest MAPE and RMSE values, so that the CART method was the best method, although there was no significant difference between the RF method and the 10% missing value data. At 15% and 20% missing values, the RF method demonstrated superior performance with smaller MAPE and RMSE values ​​compared to CART. Overall, the CART method is more suitable for use with a low proportion of missing values, while the RF method provides more stable performance at a high proportion of missing values. The results of this study provide recommendations for selecting a more appropriate imputation method based on the level of missing data.

Keywords

Bivariate Gamma Random Forest Imputations Classification and Regression Trees Root Mean Square Error Mean Absolute Percentage Error Correlation T Test

Article Details

How to Cite
Arib, M. A. A., Khaola, K. R. A., & Rido, M. R. W. (2025). Pananganan Data Hilang pada Data Bangkitan Bivariate Gamma. Diophantine Journal of Mathematics and Its Applications, 4(2), 65–73. https://doi.org/10.33369/diophantine.v4i2.46691

References

  1. S. Hong and H. S. Lynn, “Accuracy of random-forest-based imputation of missing data in the presence interaction,” J. BMC Med. Res. Methodol., vol. 1, pp. 1–12, 2020.
  2. B. O. Petrazzini, H. Naya, F. Lopez-bello, G. Vazquez, and L. Spangenberg, “Evaluation of different approaches for missing data imputation on features associated to genomic data,” BioData Min., pp. 1–13, 2021.
  3. M. Kokla, J. Virtanen, M. Kolehmainen, J. Paananen, and K. Hanhineva, “Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data : a comparative study,” J. BMC Med. Res. Methodol., pp. 1–11, 2019.
  4. Y. Ge, Z. Li, and J. Zhang, “OPEN A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods,” Sci. Rep., no. 0123456789, pp. 1–13, 2023.
  5. W. Agwil, D. Agustina, H. Fransiska, and I. A. Hasani, “Meningkatkan Kinerja Model Klasifikasi Curah Hujan Melalui Penanggulangan Missing Value Dengan Imputasi Berbasis Model,” Innov. J. Soc. Sci. Res., vol. 4, pp. 11773–11783, 2024.
  6. M. Franco and J. Vivo, “A Generator of Bivariate Distributions : Properties , Estimation , and Applications,” Math. Artic., vol. 8, no. 1776, pp. 1–30, 2020.
  7. C. Caamaño-Carrillo and J. E. Contreras-Reyes, “A Generalization of the Bivariate Gamma Distribution Based on Generalized Hypergeometric Functions,” Mathematics, no. 3, pp. 1–17, 2022.
  8. C. K. Amponsah, T. J. Kozubowski, and A. K. Panorska, “A general stochastic model for bivariate episodes driven by a gamma sequence,” J. ofStatistical Distrib. Appl., vol. 8, 2021.
  9. D. Curtis, “Welch’s t test is more sensitive to real world violations of distributional assumptions than student’s t test but logistic regression is more robust than either,” Springer, Stat. Pap., pp. 3981–3989, 2024.
  10. H. Mustafidah, A. Imantoyo, and S. Suwarsito, “Pengembangan Aplikasi Uji-t Satu Sampel Berbasis Web,” JUITA J. Inform., vol. 8, no. 2, p. 245, 2020, doi: 10.30595/juita.v8i2.8786.
  11. D. J. Stekhoven and P. Bühlmann, “MissForest — non-parametric missing value imputation for mixed-type data,” Bioinformatics, vol. 28, no. 1, pp. 112–118, 2012.
  12. D. J. Stekhoven, “Nonparametric Missing Value Imputation using Random Forests,” CRAN (manual). (terbaru; dokumentasi paket)., 2025.
  13. A. D. Shah, J. W. Bartlett, J. Carpenter, O. Nicholas, and H. Hemingway, “Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE : A CALIBER Study,” Am. J. Epidemiol., vol. 179, no. 7, pp. 764–774, 2014.
  14. R. Rachmawati, N. Afandi, and M. A. Alwansyah, “Survival Analysis on Data of Students Not Graduating on Time Using Weibull Regression, Cox Proportional Hazards Regression, and Random Survival Forest Methods,” Barekeng, vol. 19, no. 3, pp. 2111–2126, 2025.
  15. M. A. Alwansyah, “Survival Analysis of Students Not Graduated on Time Using Cox Proportional Hazard Regression Method and Random Survival Forest Method,” J. Stat. Data Sci., vol. 2, no. 1, pp. 13–21, 2023.
  16. E. Slade and M. G. Naylor, “A fair comparison of tree-based and parametric methods in multiple imputation by chained equations,” HHS Public Access, vol. 39, no. 8, pp. 1156–1166, 2022.
  17. J. Li et al., “Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets,” BMC Med. Res. Methodol., no. 24:41, pp. 1–9, 2024.
  18. M. Afkanpour, E. Hosseinzadeh, and H. Tabesh, “Identify the most appropriate imputation method for handling missing values in clinical structured datasets : a systematic review,” BMC Med. Res. Methodol., no. 24:188, 2024.
  19. T. Iida, “Identifying causes of errors between two wave-related data using performance metrics,” Appl. Ocean Res., vol. 148, no. March, p. 104024, 2024.
  20. K. Warneke, S. D. Siegel, J. Afonso, and S. Wallot, “What the mean absolute percentage error ( MAPE ) should adopt from Bland – Altman analyses,” Ger. J. Exerc. Sport Res., 2025.
  21. T. O. Hodson, “Root-mean-square error ( RMSE ) or mean absolute error ( MAE ): when to use them or not,” Geosci. Model Dev., no. 2, pp. 5481–5487, 2022.
  22. P. Schober, C. Boer, and L. A. Schwarte, “Correlation Coefficients: Appropriate Use and Interpretation,” aournal Anesth., vol. 126, no. 5, pp. 1763–1768, 2018.