Pengelompokan Artefak Dokumen Perangkat Lunak Open Source Dengan Vektor Paragraf

Guntur Budi Herwanto

doi:10.33369/pseudocode.6.2.181-185

Submitted

October 17, 2019

Accepted

October 27, 2019

Published

October 28, 2019

Download

PDF (Bahasa Indonesia)

Statistic

Read Counter : 250 Download : 191

Downloads

Download data is not yet available.

Abstract

Dalam beberapa tahun belakangan, perangkat lunak open source semakin bertumbuh. Tidak hanya perangkat lunak dalam bentuk final, namun komponen dan library perangkat lunak semakin berkembang setiap tahunnya. Github merupakan salah satu lokasi populer dalam mempublikasikan project open source. Ketersediaan dataset yang besar ini merupakan peluang bagi peneliti di bidang perangkat lunak development dalam mengembangkan risetnya. Perkembangan variasi artefak perangkat lunak membuat metode yang bersifat supervised menjadi sulit. Penilitian ini mencoba untuk melakukan pengelompokkan secara unsupervised dengan teknik clustering K-Means dan representasi paragraph vector. Langkah ini merupakan awalan dalam pembentukan model klasifikasi yang membutuhkan supervisi dalam pelabelan dokumennya. Hasil clustering menunjukkan dokumen dapat dapat di kelompokkan menjadi beberapa cluster dan hasil yang terbaik dilihat pada cluster dengan k berjumlah 6.

Kata Kunci: document clustering, doc2vec, k-means clustering, artefak perangkat lunak.

License

Seluruh materi yang terdapat dalam situs ini dilindungi oleh undang-undang. Dipersilahkan mengutip sebagian atau seluruh isi situs web ini sesuai dengan ketentuan yang berlaku.
Apabila anda menemukan satu atau beberapa artikel yang terdapat dalam Jurnal Pseudocode yang melanggar atau berpotensi melanggar hak cipta yang anda miliki, silahkan laporkan kepada kami, melalui email pada Priciple Contact.
Aspek legal formal terhadap akses setiap informasi dan artikel yang tercantum dalam situs jurnal ini mengacu pada ketentuan lisensi Creative Commons Atribusi-ShareAlike (CC-BY-SA).
Semua Informasi yang terdapat di Jurnal Pseudocode bersifat akademik. Jurnal Pseudocode tidak bertanggung jawab terhadap kerugian yang terjadi karana penyalah gunaan informasi dari situs ini.

Author Biography

Guntur Budi Herwanto, Department of Computer Science, Faculty of Mathematics and Natural Science, Universitas Gadjah mada

Ilmu Komputer /Departemen Ilmu Komputer dan Elektronika,
Fakultas Matematik dan Ilmu Pengetahuan Alam

How to Cite

Herwanto, G. B. (2019). Pengelompokan Artefak Dokumen Perangkat Lunak Open Source Dengan Vektor Paragraf. Pseudocode, 6(2), 181–185. https://doi.org/10.33369/pseudocode.6.2.181-185

Download Citation

References

N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, “Curating GitHub for engineered software projects,” Empir. Softw. Eng., vol. 22, no. 6, pp. 3219–3253, 2017.
Y. Ma, S. Fakhoury, M. Christensen, and V. Arnaoudova, “Automatic Classification of Software Artifacts in Open-Source Applications,” in The Mining Software Repositories (MSR), 2018.
T. Bi, P. Liang, A. Tang, and C. Yang, “A systematic mapping study on text analysis techniques in software architecture,” J. Syst. Softw., vol. 144, no. May, pp. 533–558, 2018.
M. Soliman, M. Galster, and M. Riebisch, “Developing an Ontology for Architecture Knowledge from Developer Communities,” Proc. - 2017 IEEE Int. Conf. Softw. Archit. ICSA 2017, pp. 89–92, 2017.
W. Ding, P. Liang, A. Tang, H. Van Vliet, and M. Shahin, “How do open source communities document software architecture: An exploratory survey,” Proc. IEEE Int. Conf. Eng. Complex Comput. Syst. ICECCS, pp. 136–145, 2014.
G. Robles, J. M. Gonzalez-barahona, J. L. Prieto, U. Rey, and J. Carlos, “Assessing and Evaluating Documentation in Libre Software Projects ?,” Hum. Factors, no. 004337, 2006.
G. Gousios, “The {GHT}orrent dataset and tool suite,” in Proceedings of the 10th Working Conference on Mining Software Repositories, 2013, pp. 233–236.
D. Blei, L. Carin, and D. Dunson, “Probabilistic topic models,” IEEE Signal Process. Mag., vol. 27, no. 6, pp. 55–65, 2010.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and Their Compositionality,” in Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, 2013, pp. 3111–3119.
Q. Le, T. Mikolov, and T. G. Com, “Distributed Representations of Sentences and Documents,” vol. 32, 2014.
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. Fifth Berkeley Symp. Math. Stat. Probab. Vol. 1 Stat., 1967, pp. 281–297.
N. X. Vinh, “Information Theoretic Measures for Clusterings Comparison : Variants , Properties , Normalization and Correction for Chance,” vol. 11, pp. 2837–2854, 2010.
A. Rosenberg and J. Hirschberg, “{V}-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure,” in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning ({EMNLP}-{C}o{NLL}), 2007, pp. 410–420.

References

N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, “Curating GitHub for engineered software projects,” Empir. Softw. Eng., vol. 22, no. 6, pp. 3219–3253, 2017.

Y. Ma, S. Fakhoury, M. Christensen, and V. Arnaoudova, “Automatic Classification of Software Artifacts in Open-Source Applications,” in The Mining Software Repositories (MSR), 2018.

T. Bi, P. Liang, A. Tang, and C. Yang, “A systematic mapping study on text analysis techniques in software architecture,” J. Syst. Softw., vol. 144, no. May, pp. 533–558, 2018.

M. Soliman, M. Galster, and M. Riebisch, “Developing an Ontology for Architecture Knowledge from Developer Communities,” Proc. - 2017 IEEE Int. Conf. Softw. Archit. ICSA 2017, pp. 89–92, 2017.

W. Ding, P. Liang, A. Tang, H. Van Vliet, and M. Shahin, “How do open source communities document software architecture: An exploratory survey,” Proc. IEEE Int. Conf. Eng. Complex Comput. Syst. ICECCS, pp. 136–145, 2014.

G. Robles, J. M. Gonzalez-barahona, J. L. Prieto, U. Rey, and J. Carlos, “Assessing and Evaluating Documentation in Libre Software Projects ?,” Hum. Factors, no. 004337, 2006.

G. Gousios, “The {GHT}orrent dataset and tool suite,” in Proceedings of the 10th Working Conference on Mining Software Repositories, 2013, pp. 233–236.

D. Blei, L. Carin, and D. Dunson, “Probabilistic topic models,” IEEE Signal Process. Mag., vol. 27, no. 6, pp. 55–65, 2010.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and Their Compositionality,” in Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, 2013, pp. 3111–3119.

Q. Le, T. Mikolov, and T. G. Com, “Distributed Representations of Sentences and Documents,” vol. 32, 2014.

J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. Fifth Berkeley Symp. Math. Stat. Probab. Vol. 1 Stat., 1967, pp. 281–297.

N. X. Vinh, “Information Theoretic Measures for Clusterings Comparison : Variants , Properties , Normalization and Correction for Chance,” vol. 11, pp. 2837–2854, 2010.

A. Rosenberg and J. Hirschberg, “{V}-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure,” in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning ({EMNLP}-{C}o{NLL}), 2007, pp. 410–420.

Pengelompokan Artefak Dokumen Perangkat Lunak Open Source Dengan Vektor Paragraf

Article Sidebar

Downloads

Main Article Content

Abstract

Article Details

Guntur Budi Herwanto, Department of Computer Science, Faculty of Mathematics and Natural Science, Universitas Gadjah mada

References

References