Hasil Pencarian

Ditemukan 3 dokumen yang sesuai dengan query

Karina Chandra Dewi

Analisis akurasi model random forest untuk big data-studi kasus prediksi klaim severity pada asuransi mobil = Analyzing accuracy of random forest model for big data-A case study of claim severity prediction in car insurance

"Klaim asuransi merupakan salah satu elemen penting dalam bidang jasa asuransi. Klaim severity mengacu pada besarnya dana yang harus dikeluarkan untuk memperbaiki kerusakan yang terjadi. Besarnya klaim asuransi dipengaruhi oleh banyak faktor. Hal ini menyebabkan volume data menjadi sangat besar. Sehingga diperlukan suatu metode yang tepat dalam memprediksi besarnya klaim severity untuk data besar. Salah satu metode yang dapat digunakan untuk menyelesaikan permasalahan tersebut yaitu Random Forest yang merupakan salah satu metode machine learning. Tesis ini mengaplikasikan model Random Forest untuk menyelesaikan masalah prediksi besarnya klaim severity pada asuransi mobil serta menganalisis pengaruh jumlah fitur yang digunakan pada model Random Forest terhadap akurasi model sebagai alternatif solusi terkait Big Data. Hasil simulasi menunjukkan bahwa model Random Forest dapat diterapkan pada kasus prediksi klaim severity yang merupakan kasus regresi dalam konteks machine learning. Dengan menggunakan 1⁄3 dari keseluruhan fitur yang ada, model Random Forest dapat menghasilkan akurasi yang setara dengan akurasi yang diperoleh ketika menggunakan seluruh fitur dalam membangun model, yaitu sekitar 99%. Hasil ini menunjukkan skalabilitas yang baik dari Random Forest terutama ditinjau dari jumlah fitur. Sehingga, model Random Forest dapat digunakan sebagai solusi untuk masalah Big Data terkait volume data.

The insurance claim is one of the important elements in the field of insurance services. Claim severity refers to the amount of fund that must be spent to repair the damage. The amount of insurance claim is influenced by many factors. This causes the volume of data to be very large. Therefore, a suitable method is required. Random Forest, one of the machine learning methods can be implemented to handle this problem. This thesis applies the Random Forest model to predict the amount of this claim severity on car insurance. Furthermore, analysis on the effect of the number of features used on model accuracy is conducted. The simulation result show that the Random Forest model can be applied in cases of prediction of claim severity which is a case of regression in the context of machine learning. Only by using 1⁄3 of the overall features, the accuracy of Random Forest model can produce accuracy that is comparable to that obtained when using all features, which is around 99%. This result confirms the scalability of Random Forest, especially in terms of the number of features. Hence, Random Forest model can be used as a solution to Big Data problems related to data volume."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2019

T54306

UI - Tesis Membership Universitas Indonesia Library

Yuni Rosita Dewi

Analisis akurasi metode seleksi fitur gram-schmidt orthogonalization pada support vector machine untuk big data-studi kasus masalah prediksi klaim = Analyzing accuracy of feature selection method gram-schmidt orthogonalization for support vector machine on big data-a case study of claim prediction problem

"Prediksi klaim merupakan proses penting dalam industri asuransi karena perusahaan asuransi dapat menyiapkan jenis polis asuransi yang tepat untuk masing-masing pemegang polis potensial. Frekuensi prediksi klaim dewasa ini kian meningkat. Sehingga data prediksi klaim yang memiliki volume besar ini disebut big data, baik dari segi jumlah fitur maupun jumlah data pemegang polis. Salah satu alternatif solusi perusahaan asuransi untuk melihat pemegang polis melakukan klaim atau tidak, bisa menggunakan machine learning yang teruji dapat digunakan untuk klasifikasi dan prediksi. Salah satu metode machine learning untuk mengurangi jumlah fitur adalah dengan proses seleksi fitur, yaitu mencari urutan fitur berdasarkan tingkat pentingnya fitur. Metode seleksi fitur yang digunakan adalah Gram-Schmidt Orthogonalization. Metode ini sebelumnya digunakan untuk data tidak terstruktur namun pada penelitian ini diuji pada data terstruktur bervolume besar. Untuk menguji urutan fitur yang diperoleh dari proses seleksi fitur, digunakan Support Vector Machine karena termasuk metode machine learning yang popular untuk klasifikasi. Berdasarkan hasil simulasi, urutan yang diperoleh dari proses Gram-Schmidt Orthogonalization relatif konsisten. Selanjutnya, dapat diketahui fitur-fitur yang paling berpengaruh untuk menentukan pemegang polis klaim atau tidak. Simulasi juga menunjukkan bahwa hanya dengan menggunakan sekitar 26 % fitur, akurasi yang dihasilkan sebanding dengan menggunakan semua fitur.

Claim prediction is an important process in the insurance industry because insurance companies can prepare the right type of insurance policy for each potential policyholder. The frequency of today`s claim predictions is increasing. So that claim prediction data has a large volume called big data, both in terms of the number of features and the number of policyholders. One alternative solution for insurance companies to see whether policyholders claim or not, we can use machine learning that is proven to be used for classification and prediction. One of the machine learning methods to reduce the number of features is the feature selection process, which is to search for sequences of features based on their importance feature. The feature selection method used is Gram-Schmidt Orthogonalization. This method was previously used for unstructured data, but in this research is tested on large volume structured data. Support Vector Machine is used to test the ordered features obtained from the feature selection process because it is a popular machine learning method for classification. Based on a result, the ordered features obtained from the Gram-Schmidt Orthogonalization process is relatively stable. After that, it can also be seen the most important features to determine policyholders claim or not. The simulation also shows that using only about 26 % features, the resulting accuracy is comparable to using all features."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2019

T54313

UI - Tesis Membership Universitas Indonesia Library

Christhoper Nugraha

Pendeteksian topik pada twitter menggunakan online eigenspace-based fuzzy c-means clustering untuk big data = Topic detection on twitter using online eigenspace-based fuzzy c-means clustering for big data / Christhoper Nugraha

"ABSTRAK

Deteksi topik adalah proses menganalisis kumpulan data tekstual untuk menentukan topik pengumpulan data tekstual. Salah satu metode pengelompokan yang dapat digunakan untuk deteksi topik adalah metode Fuzzy C-Means (FCM). Namun, penggunaan FCM sederhana untuk pendeteksian topik tentang big data kurang efektif, karena akan memakan waktu lama dan banyak memori. FCM sederhana juga memiliki masalah lain, ketika melakukan deteksi topik aktif data dimensi tinggi, FCM sederhana hanya akan menghasilkan satu topik. Dalam penelitian ini, suatu gabungan metode Single-Pass Fuzzy C-Means (SPFCM) dan Fuzzy C-Means Berbasis Eigenspace (EFCM) diusulkan, yaitu Single-Pass Eigenspace-Based Fuzzy C-Means (SPEFCM) metode untuk mengatasi masalah ini. Data yang digunakan untuk deteksi topik adalah

tweet yang berasal dari aplikasi Twitter. Lalu, keakuratan topik didapat menggunakan SPEFCM dan EFCM akan dibandingkan berdasarkan nilai koherensi. Itu hasil simulasi menunjukkan bahwa nilai koherensi topik yang diperoleh menggunakan SPEFCM adalah sebanding dengan EFCM. Ini menunjukkan bahwa SPEFCM adalah metode yang tepat untuk mendeteksi topik pada data besar, tanpa mengurangi kualitas topik yang dihasilkan.

ABSTRACT

Topic detection is the process of analyzing a textual data set to determine the topic of textual data collection. One of the grouping methods that can be used for topic detection is the Fuzzy C-Means (FCM) method. However, the use of simple FCM for the detection of topics about big data is less effective, because it will take a long time and a lot of memory. Simple FCM also has another problem, when detecting active topics of high dimensional data, simple FCM will only produce one topic. In this study, a combination of the Single-Pass Fuzzy C-Means (SPFCM) method and the Fuzzy C-Means Based on Eigenspace (EFCM) is proposed, namely the Single-Pass Eigenspace-Based Fuzzy C-Means (SPEFCM) method to overcome this problem. The data used for topic detection is

tweets that come from the Twitter application. Then, the accuracy of the topics obtained using SPEFCM and EFCM will be compared based on coherence values. The simulation results show that the topic coherence value obtained using SPEFCM is comparable to EFCM. This shows that SPEFCM is the right method for detecting topics in big data, without reducing the quality of the topics produced."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2019

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian