PERBANDINGAN KINERJA METODE PRA-PEMROSESAN DALAM PENGKLASIFIKASIAN OTOMATIS DOKUMEN PATEN

Budi Nugroho, Asep Denih

Abstract


This paper presents a performance analysis and comparison of several pre-processing methods used in automatic patent classification with graph kernels for Support Vector Machine (SVM). The pre-processing methods are based on the data transform techniques, namely data scaling, data centering, data standardization, data normalization, the Box-Cox transform and the Yeo-Johnson transform. The automatic patent classification is designed to classify an input of patent citation graphs into one of 10 possible classes of the International Patent Classification (IPC). The input is taken with various background conditions. The experiments showed that the best result is achieved when the pre-processing method is data normalization, achieving a classification accuracy of up to 85.33.15% for the KEHL and 93.80% for the KVHL. In contrast, for the KEHG, the preprocessing method application decreased the accuracy.


Keywords


pre-processing, graph kernels, support vector machine, automatic patent classification

References


V. López, A. Fernández, J. G. Moreno-Torres and F. Herrera. 2012. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Systems with Applications, vol. 39, pp. 6585-6608.

S. Yazdani, J. Shanbehzadeh and M. T. Manzuri Shalmani. 2012. RPCA: A Novel Preprocessing Method for PCA. Advances in Artificial Intelligence, vol. 2012, pp. 1-7.

P. Jha, K. K. Lavania, D. Dembla and H. Arora. 2013. SVC-ACO Architecture : An Efficient Data Preprocessing. International Journal of Electronics Engineering, vol. 5, no. 1, pp. 5-8.

K. Adamiak, P. Duch and K. Åšlot. 2016. Object Classification Using Support Vector Machines with Kernel-based Data Preprocessing. Image Processing & Communications, vol. 21, pp. 45-53, 2016.

T. Karunaratne and H. Bostrm, "Pre-Processing Structured Data for Supervised Graph Propositionalization - a Case Study with Medicinal Chemistry Datasets," in 2010 Ninth International Conference on Machine Learning and Applications (ICMLA), Washington.

S. Almuhaideb and M. E. B. Menai. 2016. Impact of preprocessing on medical data classification. Frontiers of Computer Science, vol. 10, no. 6, pp. 1082-1102.

Z. Yang, A. J. Smola, L. Song and A. G. Wilson. 2014. A la Carte - Learning Fast Kernels. vol. 38, pp. 1098-1106.

L. X. Chen. 2017. Do patent citations indicate knowledge linkage? The evidence from text similarities between patents and their citations. Journal of Informetrics, vol. 11, no. 1, pp. 63-79.

M. Sugiyama and K. M. Borgwardt. 2015. Halting in Random Walk Kernels. Advances in Neural Information Processing Systems, no. Section 2, pp. 1639 - 1647.

C. Zhu and D. Gao. 2016. Influence of data preprocessing. Journal of Computing Science and Engineering, vol. 10, pp. 51-57.

O. Devos, G. Downey and L. Duponchel. 2014. Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils. Food Chemistry, vol. 148, pp. 124-130.

B. Nugroho and E. Marlina. 2017. Pengkategorian Otomatis Artikel Ilmiah dalam Pangkalan Data Perpustakaan Digital Menggunakan Metode Kernel Graph. Jurnal IPTEK-KOM : Jurnal Ilmu Pengetahuan dan Teknologi Komunikasi, vol. 19, no. 2, pp. 95 - 106.

M. Kuhn, "Building Predictive Models in R Using the caret Package. 2008. Journal of Statistical Software, Articles, vol. 28, no. 5, pp. 1-26.

A. Karatzoglou, A. Smola, K. Hornik and A. Zeileis. 2004. Kernlab - An S4 Package for Kernel Methods in R," Journal of Statistical Software, vol. 11, no. 9, pp. 1-20.


Full Text: PDF

DOI: 10.33751/komputasi.v17i2.2148 Abstract views : 927 views : 622

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.