Hasil Pencarian

Ditemukan 145137 dokumen yang sesuai dengan query

Ryan Fathurrachman

Model Ensemble Random Forest dan Support Vector Machine untuk Mendeteksi Penyakit Pneumonia Menggunakan Data Sekuens Protein = Ensemble Model of Random Forest and Support Vector Machine for Detecting Pneumonia Using Protein Sequence Data

"ISPA atau infeksi saluran pernapasan akut adalah infeksi yang menyerang saluran pernapasan, baik saluran pernapasan atas maupun bawah. Salah satu penyakit yang termasuk dalam ISPA adalah pneumonia. Pneumonia merupakan infeksi paru-paru yang dapat memengaruhi kesehatan manusia secara serius. Pneumonia memengaruhi paru-paru bagian bawah dan menjadi penyebab area tersebut dipenuhi cairan lendir atau nanah. Pneumonia dikarenakan oleh berbagai agen patogen seperti virus, bakteri, dan jamur. Bakteri yang paling sering menyebabkan pneumonia adalah Streptococcus pneumoniae. Selain itu, Mycobacterium tuberculosis juga merupakan bakteri penyebab pneumonia di beberapa negara Asia. Berdasarkan hasil radiologi, pneumonia mirip dengan pneumonia tuberkulosis. Diagnosis dini sangat berperan penting dalam pengelolaan dan pengobatan efektif untuk penyakit ini. Dengan adanya kemajuan di bidang bioinformatika, sekuens protein menjadi salah satu pendekatan yang potensial untuk mendeteksi pneumonia secara cepat dan akurat. Oleh karena itu, penelitian ini adalah pendeteksian penyakit pneumonia dengan sekuens protein. Ekstraksi fitur untuk menjadi data numerik dibutuhkan pada penelitian ini dengan metode discere Penelitian ini menggunakan metode ensemble dari model Random Forest dan Support Vector Machine (SVM) dengan weighted majority algorithm (WMA) untuk mendeteksi penyakit pneumonia menggunakan sekuens protein Streptococcus pneumoniae dan Mycobacterium tuberculosis sebagai pembanding yang didapatkan melalui situs UniProt. Hasil penelitian ini menunjukkan bahwa metode ensemble model Random Forest dan model SVM dengan metode WMA memiliki kinerja terbaik dengan perbandingan data training dan data testing sebesar 80:20 didapat nilai akurasi sebesar 99,17%, nilai sensitivitas sebesar 99,65%, nilai spesifisitas sebesar 97,56%, dan nilai ROC-AUC sebesar 98,61%.

Infection of Acute Respiratory (ARI) is an infection that attacks the respiratory tract, affecting both the upper and lower respiratory tracts. One of the diseases included in ARI is pneumonia. Pneumonia is a lung infection that can seriously impact human health. It affects the lower part of the lungs and causes the area to fill with mucus or pus. Pneumonia can be caused by various pathogens such as viruses, bacteria, and fungi. The bacterium most commonly causing pneumonia is Streptococcus pneumoniae. Additionally, Mycobacterium tuberculosis is also a bacterial cause of pneumonia in several Asian countries. Based on radiological results, pneumonia is similar to tuberculosis pneumonia. Early diagnosis is crucial in the management and effective treatment of this desease. With advancements in bioinformatics, protein sequence has become a potential approach for the rapid and accurate detection of pneumonia. Therefore, this research focuses on the detection of pneumonia using protein sequences. Feature extraction is required to convert the data into numerical form using discere method. This research uses an ensemble method combining Random Forest and Support Vector Machine (SVM) models with the weighted majority algorithm (WMA) to detect pneumonia using protein sequences of Streptococcus pneumoniae and Mycobacterium tuberculosis for comparison. This protein sequences obtained from the UniProt website. The results of this research indicate that the ensemble method of Random Forest and SVM with WMA achieved the best performance with a training to testing data ratio of 80:20 with 99,17% accuracy, 99,65% sensitivity, 97,56% specificity, and 98,61% ROC-AUC score."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Dian Puspita Sari

Klasifikasi sekuens protein Coronavirus penyebab COVID-19 menggunakan metode Particle Swarm Optimization-Support Vector Machine dan Seleksi Fitur Random Forest-Recursive Feature Elimination = Classification of coronavirus protein sequences cause COVID-19 disease using Particle Swarm Optimization-Support Vector Machine Method and Feature Selection of Random Forest-Recursive Feature Elimination

"Coronavirus yaitu kelompok virus yang menginfeksi sistem pernapasan yang dapat menyebabkan infeksi pernapasan ringan maupun berat. Salah satu virus yang termasuk ke dalam coronavirus adalah SARS-CoV-2. Penyakit yang disebabkan oleh virus SARS-CoV-2 disebut COVID-19. COVID-19 pertama kali terdeteksi pada tahun 2019 di Wuhan, China. Penyebaran COVID-19 sangat cepat dengan tingkat kematian yang tinggi terus terjadi di berbagai negara sehingga penyakit ini berstatus pandemi. Skripsi ini menyelesaikan masalah klasifikasi virus SARS-CoV-2 dengan menggunakan data sekuens protein coronavirus. Seleksi fitur pada data sekuens protein coronavirus menggunakan metode seleksi fitur Random Forest-Recurisive Feature Elimination (RF-RFE). Setelah dilakukan seleksi fitur, dilakukan klasifikasi menggunakan pendekatan machine learning dengan metode Support Vector Machine (SVM) dan Particle Swarm Optimization-Support Vector Machine (PSO-SVM). Hasil terbaik performa rata-rata akurasi, spesifisitas, dan sensitivitas untuk metode SVM berturut-turut adalah 93,43%, 98,06%, dan 88,84% pada data pelatihan sebesar 80%. Untuk metode PSO-SVM, hasil terbaik rata-rata akurasi dan spesifisitas adalah 98,48% dan 98,57% pada data pelatihan sebesar 80%, sedangkan hasil terbaik rata-rata sensitivitas adalah 98,96% pada data pelatihan sebesar 90%. Oleh karena itu, pada penelitian ini dapat disimpulkan bahwa metode PSO-SVM menghasilkan performa yang lebih baik dibandingkan dengan metode SVM.

Coronaviruses are a group of viruses that infect the respiratory system that can cause mild or severe respiratory infections. One of the viruses that belongs to the coronavirus is SARS-CoV-2. The disease caused by the SARS-CoV-2 virus is called COVID-19. COVID-19 was first detected in 2019 in Wuhan, China. The spread of COVID-19 is very fast with a high mortality rate that continues to occur in various countries so that this disease has a pandemic status. This thesis solves the problem of classifying the SARS-CoV-2 virus using coronavirus protein sequence data. Feature selection on coronavirus protein sequence data used the Random Forest-Recursive Feature Elimination (RF-RFE) feature selection method. After feature selection, classification is carried out using a machine learning approach with the Support Vector Machine (SVM) and Particle Swarm Optimization-Support Vector Machine (PSO-SVM) methods. The best results of the average performance of accuracy, specificity, and sensitivity for the SVM method are 93.43%, 98.06%, and 88.84%, respectively, for training data of 80%. For the PSO-SVM method, the best results on average accuracy and specificity are 98.48% and 98.57% on training data of 80%, while the best results on average sensitivity are 98.96% on training data of 90%. Therefore, in this study it can be concluded that the PSO-SVM method produces better performance than the SVM method."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Antonius Rangga Hapsoro Wicaksono

Klasifikasi Kanker Serviks Menggunakan Metode Stacking Classifier Random Forest-Decision Tree-Support Vector Machine = Cervical Cancer Classification Using Stacking Classifier Random Forest-Decision Tree-Support Vector Machine

"Kanker merupakan salah satu penyebab kematian utama di dunia, dengan 18,1 juta kasus dan 10 juta kematian pada 2020. Kanker serviks menempati urutan keempat secara global dan kedua di Indonesia. Tingginya angka kematian lebih banyak terjadi di negara berpenghasilan menengah ke bawah karena keterbatasan akses pada pencegahan. Deteksi dini kanker serviks sering sulit dilakukan hingga mencapai stadium lanjut. Salah satu metode deteksi dini adalah menggunakan machine learning. Penelitian ini mengaplikasikan algoritma stacking classifier yang menggabungkan decision tree, support vector machine, dan random forest sebagai first-level learner, serta logistic regression sebagai meta learner, untuk mengklasifikasi pasien kanker serviks. Dataset berasal dari 858 pasien di Hospital Universitario de Caracas, Venezuela. Data dibagi 70% untuk pelatihan dan 30% untuk pengujian, dengan lima percobaan acak. Model menghasilkan akurasi rata-rata 95,03%, precision 99,05%, sensitivity 95,49%, specificity 89,39%, dan G-mean 92,37%. Meskipun stacking ensemble menunjukkan performa yang baik, model tunggal menghasilkan kinerja yang sedikit lebih baik namun tidak signifikan.

Cancer is a leading cause of death worldwide, with 18.1 million cases and 10 million deaths in 2020. In Indonesia, there were 396,914 cases and 235,511 deaths. Cervical cancer is the fourth most common cancer globally and the second most common in Indonesia. Higher death rates occur in low- and middle-income countries due to limited access to preventive measures. Cervical cancer is often difficult to detect until it reaches an advanced stage. This research applies a machine learning approach, using a stacking classifier algorithm that combines decision tree, support vector machine, and random forest models as first-level learners, with logistic regression as the meta learner, to classify patients with and without cervical cancer. The dataset, from the UCI Repository, contains data from 858 patients at risk for cervical cancer at Hospital Universitario de Caracas in Venezuela. The data was split into 70% for training and 30% for testing, with five random trials. The model achieved an average accuracy of 95.03%, precision of 99.05%, sensitivity of 95.49%, specificity of 89.39%, and a G-mean of 92.37%. While the stacking ensemble model performed well, single-classifier models showed slightly better performance, though the difference was not significant."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Ricco Yhandy Fernando

Implementasi Sistem Klasifikasi Penyakit Paru-Paru Dari Data Screening Menggunakan Metode Support Vector Machine Dan Ensemble Bagging Gaussian Naive Bayes = Implementation Of A Lung Disease Classification System Using The Support Vector Machine And Ensemble Bagging Gaussian NaÃ¯ve Bayes Methods.

"Penyakit pada paru-paru merupakan gangguan yang cukup serius dimana dapat menyerang sistem pernapasan manusia dan bisa berakibat fatal jika tidak ditangani dengan serius. Pada saat ini deteksi penyakit pada paru-paru masih dilakukan secara manual oleh para dokter ahli, namun proses secara manual memakan waktu lama. Oleh karena itu, dalam penelitian ini dibuat sistem yang dapat mendeteksi dan mengklasifikasi penyakit paru-paru dengan otomatis. Dalam penelitian ini akan digunakan dua metode yaitu Support Vector Machine dan Ensemble Bagging Gaussian Naïve Bayes . Data yang digunakan dalam penelitian ini adalah data screening yang berjumlah seratus data pasien, data di dapatkan dari salah satu sumber yang memiliki data primer yaitu salah satu rumah sakit di Yogyakarta. Penelitian ini menggunakan dua belas gejala paru-paru dan diklasifikasikan kedalam lima kelas penyakit paru-paru yaitu tuberkulosis, penyakit paru obstruktif kronis, pneumonia, asma bronkial, kanker paru. Sistem klasifikasi akan di implementasikan menggunakan bahasa pemrograman PHP. Pengujian kinerja klasifikasi menggunakan Confusion Matrix dan aplikasi diuji dengan menggunakan System Usability Scale. Penelitian ini menghasilkan sistem klasifikasi penyakit paru-paru dengan menggunakan metode Support Vector Machine dan Ensemble Bagging Gaussian Naïve Bayes, dari hasil pengujian akurasi Confusion Matrix pada algoritma Support Vector Machine mendapatkan hasil akurasi 93,9% , recall 92%, precison 79%, dan f1 score 54%, sedangkan pada Ensemble Bagging Gausian Naïve Bayes mendapatkan hasil akurasi 88,9 % recall 92%, precision 79%, f1 score 54%, serta pengujian sistem menggunakan metode System Usability Scale nilai yang diperolah sebesar 73 atau mendapatkan grade B.

Lung disease is a serious disorder that can attack the human respiratory system and can be fatal if not treated seriously. Currently, lung disease detection is still done manually by expert doctors, but the manual process takes a long time. Therefore, in this research a system was created that can detect and classify lung diseases automatically. In this research, two methods will be used, namely Support Vector Machine and Ensemble Bagging Gaussian NaÃ¯ve Bayes. The data used in this research is screening data consisting of one hundred patient data, the data was obtained from one source that has primary data, namely one of the hospitals in Yogyakarta. This study used twelve lung symptoms and classified them into five classes of lung disease, namely tuberculosis, chronic obstructive pulmonary disease, pneumonia, bronchial asthma, lung cancer. The classification system will be implemented using the PHP programming language. Classification performance testing uses the Confusion Matrix and the application is tested using the System Usability Scale. This research produces a lung disease classification system using the Support Vector Machine method and Ensemble Bagging Gaussian NaÃ¯ve Bayes, from the results of Confusion Matrix accuracy testing on the Support Vector Machine algorithm, the results are 93.9% accuracy, 92% recall, 79% precision, and f1 score was 54%, while Ensemble Bagging Gausian NaÃ¯ve Bayes obtained accuracy results of 88.9%, recall 92%, precision 79%, f1 score 54%, and system testing using the System Usability Scale method obtained a score of 73 or got grade B. "

Depok: Fakultas Teknik Universitas Indonesia, 2024

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Dilla Fadlillah Salma

Analisis akurasi metode support vector machine, random forest, dan logistic regression dalam mengklasifikasi data asuransi mobil dengan implementasi metode seleksi fitur one dimensional naive bayes classifier = Accuracy analysis of support vector machine, random forest, and logistic regression method in classifying car insurance data with one dimensional naive bayes classifier features selection implementation

"Kepemilikan dan penggunaan kendaraan mobil memiliki berbagai risiko negatif, seperti terjadinya kecelakaan. Untuk mengurangi beban risiko tersebut, perusahaan menjual produk asuransi mobil. Asuransi mobil merupakan salah satu produk perusahaan asuransi kendaraan yang bertujuan sebagai upaya perlindungan pemilik kendaraan mobil dari kerugian finansial yang terjadi pada kendaraan yang diasuransikannya. Untuk menawarkan produk asuransi, beberapa perusahaan menggunakan teknik penjualan dengan cara cold calling. Teknik penjualan tersebut akan lebih efektif menjual produk asuransi jika terlebih dahulu data nasabah calon pembeli asuransi diprediksi atau diklasifikasi ke dalam kelas membeli atau tidak membeli.
Pada skripsi ini, dilakukan klasfikasi dengan metode Support Vector Machine (SVM), Random Forest (RF),dan Logistic Regression (LR) dengan implementasi metode seleksi fitur One Dimensional NaÃ¯ve Bayes Classifier (1-DBC). Data yang diperoleh berjumlah 4000 data dengan total 18 fitur. Diperoleh hasil bahwa akurasi SVM lebih tinggi dibandingkan dengan kedua metode lainnya. Selain itu, mplementasi metode seleksi fitur telah berhasil meningkatkan akurasi dari metode Random Forest, dan Logistic Regression. Dengan implementasi 1-DBC, ketiga metode klasifikasi memperoleh hasil akurasi tertinggi pada penggunaan 15 fitur.
Ownership and use of car vehicles have a variety of negative risks, such as accidents. To reduce the risk burden, the company sells car insurance products. Car insurance is one of the products of a vehicle insurance company that aims to protect vehicle owners from financial losses that occur on their insured vehicles. To offer insurance products, some companies use sales techniques using cold calling. The sales technique will be more effective in selling insurance products if first the prospective customer buyer data is predicted or classified into the class of buying or not buying.
In this paper, classification is done using the method of Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR) by implementing the One Dimensional NaA-ve Bayes Classifier (1-DBC) feature selection method. The data obtained amounted to 4000 data with a total of 18 features. The results were obtained that the accuracy of SVM was higher compared to the other two methods. In addition, the implementation of the feature selection method has succeeded in increasing the accuracy of the Random Forest, and Logistic Regression. With the implementation of 1-DBC, the three classification methods obtained the highest accuracy results with the use of 15 features."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2018

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Cari yang mirip

Tambahkan ke Favorit

Metadata PDF

Abstrak PDF

Abstrak

Nadia Hartini Kusumawijaya

Komparasi Kinerja Metode Random Forest Regression dengan Metode Support Vector Regression untuk Memprediksi Usia Biologis pada Data Pemeriksaan Medis = Comparison of the Performance of the Random Forest Regression Method with the Support Vector Regression Method for Predicting Biological Age on Medical Examination Data

"Penuaan adalah salah satu faktor utama resiko terjadinya penyakit dan kematian. Laju
penuaan individu dengan usia kronologis yang sama terbukti bervariasi. Maka dari
itu, muncul kebutuhan untuk alat pengukuran penuaan yang lebih akurat, robust, dan
dapat diandalkan dibandingkan usia kronologis, yakni usia biologis. Pada penelitian
ini, penulis membangun model menggunakan Metode Random Forest Regression (RF)
dan Metode Support Vector Regression (SVR) untuk memprediksi umur biologis pada
data pemeriksaan medis, menilai dan mengevaluasi hasil kinerjanya, serta melakukan
komparasi kinerja kedua metode. Terkait metode yang digunakan, Metode RF adalah
metode yang mengaplikasikan Teknik Ensemble Learning dengan cara menggabungkan
beberapa decision tree untuk menghasilkan prediksi. Sedangkan, Metode SVR adalah
metode yang berkerja dengan cara membangun hyperplane atau kumpulan hyperplane
dalam ruang berdimensi tinggi yang dapat digunakan untuk regresi linier atau nonlinier.
Dataset yang digunakan adalah data medis yang berasal dari Kementrian Kesehatan
Republik Indonesia. Pada dataset dilakukan data preprocessing, yakni data diproses pada
aspek missing values handling, encoding, dan outliers detection and outliers handling.
Kemudian, dilakukan feature selection menggunakan Spearman’s Rank Correlation
Coefficient. Setelah itu, dilakukan pembangunan model dengan Metode RF dan model
dengan Metode SVR secara terpisah untuk masing - masing jenis kelamin. Terakhir,
performa model dievaluasi dan dibandingkan kinerjanya menggunakan metrik evaluasi
Root Mean Square Error (RMSE), Coefficient of Determination (R2), Adjusted R2, dan
running time. Metode RF menggunakan hyperparameter terbaik {’max depth’: 15,
’n estimators’: 1150} untuk dataset pria, dan {’max depth’: 15, ’n estimators’: 1250}
untuk dataset wanita. Sedangkan, Metode SVR menggunakan hyperparameter terbaik
{’C’: 2,’epsilon’: 0,2, ’gamma’: ’scale’, ’kernel’: ’rbf’, ’tol’: 0,005} untuk dataset pria,
dan {’C’: 3, ’epsilon’: 0,2, ’gamma’: ’scale’, ’kernel’: ’rbf’, ’tol’: 0,005} untuk dataset
wanita. Metode RF memiliki kinerja yang cukup baik, dengan nilai RMSE = 7,532; R2
= 0,403; Adjusted R2 = 0,351; running time = 0,154 untuk pria dan RMSE = 6,889;
R2 = 0,340; Adjusted R2 = 0,264; running time = 0,179 untuk wanita. Selain itu, SVR
juga memiliki performa yang cenderung sama namun sedikit lebih buruk, dengan nilai
RMSE = 7,692; R2 = 0,376; Adjusted R2 = 0,321; running time = 0,035 untuk pria dan
RMSE = 6,905; R2 = 0,337; Adjusted R2 = 0,306; running time = 0,080 untuk wanita.
Berdasarkan analisis kinerja model yang dilakukan pada penelitian ini model yang
dibangun dengan Metode Random Forest Regression lebih unggul dalam memprediksi
usia biologis dibandingkan dengan Metode Support Vector Regression.
Aging is one of the main risk factors for disease and death. The aging rate of individ- uals of the same chronological age has been shown to vary. So therefore, a need arises for a more accurate, robust, and reliable aging measurement tool than chronological age, namely biological age. In this research, the author build a model using the Random For- est Regression (RF) Method and the Support Vector Regression (SVR) Method to predict biological age from patient clinical data, assess and evaluate the performance results, and compare the performance of the two models. Regarding the method used, the Random Forest Regression Method is a method that applies the Ensemble Learning Technique by combining several decision trees to produce predictions. Meanwhile, the Support Vector Regression Method is a method that works by building a hyperplane or collection of hy- perplane in high-dimensional space which can be used for linear or nonlinear regression. The dataset used is medical data originating from the Ministry of Health of the Republic of Indonesia. On the dataset, data preprocessing is carried out, namely the data is processed in the aspects of missing values handling, encoding, and outliers detection and outliers handling. Then, feature selection is carried out using Spearman’s Rank Correlation Co- efficient. After that, machine learning model using RF Method and machine learning model using SVR Method were created separately for each gender. Finally, the model performance is evaluated and its performance compared using evaluation metrics, namely Root Mean Square Error (RMSE), Coefficient of Determination (R2), and Adjusted R2, as well as running time. The RF Method used best hyperparameters {’max depth’: 15, ’n estimators’: 1150} for the male dataset, and {’max depth’: 15, ’n estimators’: 1250 } for the female dataset. Meanwhile, the SVR Method used best hyperparameters {’C’: 2, ’epsilon’: 0.2, ’gamma’: ’scale’, ’kernel’: ’rbf’, ’toll’: 0.005} for the male dataset, and {’C’: 3, ’epsilon’: 0, 2, ’gamma’: ’scale’, ’kernel’: ’rbf’, ’toll’: 0.005} for female dataset. The result is that the model built using the RF Method has quite good performance, with an RMSE value of = 7.532; R2 = 0.403; Adjusted R2 = 0.351; running time = 0.154 for men and RMSE = 6.889; R2 = 0.340; Adjusted R2 = 0.264; running time = 0.179 for women. Apart from that, SVR also has performance that tends to be the same but slightly worse, with an RMSE value of = 7,692; R2 = 0.376; Adjusted R2 = 0.321; running time = 0.035 for men and RMSE = 6.905; R2 = 0.337; Adjusted R2 = 0.306; running time = 0.080 for women. Based on the model performance analysis carried out in this research, the model built using the Random Forest Regression Method is superior in predicting biological age compared to the Support Vector Regression Method."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership  Universitas Indonesia Library

Cari yang mirip

Tambahkan ke Favorit

Metadata PDF

Abstrak PDF

Abstrak

Afifah Rofi Laeli

Komparasi model convolutional neural network-random forest dan convolutional neural network-XGBoost untuk mendeteksi tuberkulosis paru berdasarkan data radiografi toraks = Comparison of convolutional neural network-random forest and convolutional neural network-XGBoost models for detecting pulmonary tuberculosis based on thorax radiography data

"Tuberkulosis (TB) merupakan suatu penyakit menular yang sebagian besar menyerang paru-paru manusia. Penularan penyakit ini terjadi ketika pasien tuberkulosis paru mengeluarkan percikan dahak yang mengandung kuman tuberkulosis ke udara. Penularannya yang mudah menjadikan tuberkulosis sebagai masalah kesehatan masyarakat, baik di Indonesia maupun internasional. Deteksi dini tuberkulosis paru dapat mencegah penularan serta menyembuhkan pasien. Namun, adanya pandemi COVID-19 saat ini dapat menurunkan angka kasus tuberkulosis yang berhasil terdeteksi. Hal ini menunjukkan perlu adanya kemajuan dalam metode pendeteksian penyakit tuberkulosis paru. Kini, perkembangan teknologi dapat dimanfaatkan untuk membantu bidang kesehatan, salah satunya dengan machine learning. Machine learning dapat digunakan untuk mendeteksi adanya suatu penyakit berdasarkan data citra. Dalam penelitian ini, model machine learning, Convolutional Neural Network-Random Forest (CNN-Random Forest) dan Convolutional Neural Network-XGBoost (CNN-XGBoost), diimplementasikan untuk mendeteksi tuberkulosis paru berdasarkan citra radiografi toraks. Selanjutnya, kedua model tersebut dievaluasi dan dibandingkan kinerjanya berdasarkan nilai akurasi dan nilai luas wilayah di bawah kurva ROC, atau biasa disebut dengan area under the curve (AUC). Data yang digunakan sebanyak 6000 yang terdiri dari 3000 citra radiografi toraks tuberkulosis paru dan 3000 citra radiografi toraks normal. Berdasarkan hasil yang diperoleh, model CNN-Random Forest dan CNN-XGBoost memberikan kinerja yang baik dan dapat diterapkan untuk mendeteksi tuberkulosis paru, dimana CNN digunakan untuk mengekstraksi fitur pada citra, kemudian hasil ekstraksi fitur tersebut menjadi input bagi pengklasifikasi Random Forest dan XGBoost. Evaluasi kinerja berdasarkan rata-rata nilai akurasi dan rata-rata nilai AUC pada model CNN-Random Forest memberikan hasil terbaik masing-masing sebesar 98.667% dan 99.933%, sementara pada model CNN-XGBoost memberikan hasil terbaik masing-masing sebesar 98.367% dan 99.866%. Kemudian berdasarkan perbandingan kinerja yang dilakukan, model CNN-Random Forest memberikan kinerja yang lebih baik dalam mendeteksi tuberkulosis paru dibandingkan dengan model CNN-XGBoost.
Tuberculosis (TB) is an infectious disease that in most cases attacks the human lungs. Transmission of this disease occurs when a patient with pulmonary tuberculosis expels phlegm containing tuberculosis germs into the air. Its easy transmission makes tuberculosis a public health problem, both in Indonesia and internationally. Early detection of pulmonary tuberculosis can prevent transmission and cure patients. However, the current COVID-19 pandemic can reduce the number of successfully detected tuberculosis cases. This shows the need for progress in the detection method of pulmonary tuberculosis. Now, technological developments can be used to help the health sector, one of which is machine learning. Machine learning can be used to detect the presence of a disease based on image data. In this study, machine learning models, Convolutional Neural Network-Random Forest (CNN-Random Forest) and Convolutional Neural Network-XGBoost (CNN-XGBoost), were implemented to detect pulmonary tuberculosis based on thorax radiography images. Furthermore, the performances of the two models were evaluated and compared based on the values of accuracy and area under the ROC curve, or commonly called the area under the curve (AUC). The data used were 6000 consisting of 3000 thorax radiography images of pulmonary tuberculosis and 3000 normal thorax radiography images. Based on the results obtained, the CNN-Random Forest and CNN-XGBoost models provided good performances and can be applied to detect pulmonary tuberculosis, where CNN was used to extract features in the image, then the results of the feature extraction became input for the Random Forest and XGBoost classifiers. Performance evaluation based on the average values of accuracy and AUC in the CNN-Random Forest model gave the best results of 98.667% and 99.933%, respectively, while the CNN-XGBoost model gave the best results of 98.367% and 99.866, respectively. Then based on the performance comparison, the CNN-Random Forest model provided a better performance in detecting pulmonary tuberculosis compared to the CNN-XGBoost model."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership  Universitas Indonesia Library

Cari yang mirip

Tambahkan ke Favorit

Metadata PDF

Abstrak PDF

Abstrak

Afifah Rofi Laeli

Komparasi Model Convolutional Neural Network-Random Forest dan Convolutional Neural Network-XGBoost untuk mendeteksi tuberkulosis paru berdasarkan data radiografi toraks = Comparison of Convolutional Neural Network-Random Forest and Convolutional Neural Network-XGBoost Models for detecting pulmonary tuberculosis based on thorax radiography data

"Tuberkulosis (TB) merupakan suatu penyakit menular yang sebagian besar menyerang paru-paru manusia. Penularan penyakit ini terjadi ketika pasien tuberkulosis paru mengeluarkan percikan dahak yang mengandung kuman tuberkulosis ke udara. Penularannya yang mudah menjadikan tuberkulosis sebagai masalah kesehatan masyarakat, baik di Indonesia maupun internasional. Deteksi dini tuberkulosis paru dapat mencegah penularan serta menyembuhkan pasien. Namun, adanya pandemi COVID-19 saat ini dapat menurunkan angka kasus tuberkulosis yang berhasil terdeteksi. Hal ini menunjukkan perlu adanya kemajuan dalam metode pendeteksian penyakit tuberkulosis paru. Kini, perkembangan teknologi dapat dimanfaatkan untuk membantu bidang kesehatan, salah satunya dengan machine learning. Machine learning dapat digunakan untuk mendeteksi adanya suatu penyakit berdasarkan data citra. Dalam penelitian ini, model machine learning, Convolutional Neural Network–Random Forest (CNN– Random Forest) dan Convolutional Neural Network–XGBoost (CNN–XGBoost), diimplementasikan untuk mendeteksi tuberkulosis paru berdasarkan citra radiografi toraks. Selanjutnya, kedua model tersebut dievaluasi dan dibandingkan kinerjanya berdasarkan nilai akurasi dan nilai luas wilayah di bawah kurva ROC, atau biasa disebut dengan area under the curve (AUC). Data yang digunakan sebanyak 6000 yang terdiri dari 3000 citra radiografi toraks tuberkulosis paru dan 3000 citra radiografi toraks normal. Berdasarkan hasil yang diperoleh, model CNN-Random Forest dan CNN-XGBoost memberikan kinerja yang baik dan dapat diterapkan untuk mendeteksi tuberkulosis paru, dimana CNN digunakan untuk mengekstraksi fitur pada citra, kemudian hasil ekstraksi fitur tersebut menjadi input bagi pengklasifikasi Random Forest dan XGBoost. Evaluasi kinerja berdasarkan rata-rata nilai akurasi dan rata-rata nilai AUC pada model CNN- Random Forest memberikan hasil terbaik masing-masing sebesar 98.667% dan 99.933%, sementara pada model CNN-XGBoost memberikan hasil terbaik masing-masing sebesar 98.367% dan 99.866%. Kemudian berdasarkan perbandingan kinerja yang dilakukan, model CNN-Random Forest memberikan kinerja yang lebih baik dalam mendeteksi tuberkulosis paru dibandingkan dengan model CNN-XGBoost.
Tuberculosis (TB) is an infectious disease that in most cases attacks the human lungs. Transmission of this disease occurs when a patient with pulmonary tuberculosis expels phlegm containing tuberculosis germs into the air. Its easy transmission makes tuberculosis a public health problem, both in Indonesia and internationally. Early detection of pulmonary tuberculosis can prevent transmission and cure patients. However, the current COVID-19 pandemic can reduce the number of successfully detected tuberculosis cases. This shows the need for progress in the detection method of pulmonary tuberculosis. Now, technological developments can be used to help the health sector, one of which is machine learning. Machine learning can be used to detect the presence of a disease based on image data. In this study, machine learning models, Convolutional Neural Network–Random Forest (CNN–Random Forest) and Convolutional Neural Network–XGBoost (CNN–XGBoost), were implemented to detect pulmonary tuberculosis based on thorax radiography images. Furthermore, the performances of the two models were evaluated and compared based on the values of accuracy and area under the ROC curve, or commonly called the area under the curve (AUC). The data used were 6000 consisting of 3000 thorax radiography images of pulmonary tuberculosis and 3000 normal thorax radiography images. Based on the results obtained, the CNN-Random Forest and CNN-XGBoost models provided good performances and can be applied to detect pulmonary tuberculosis, where CNN was used to extract features in the image, then the results of the feature extraction became input for the Random Forest and XGBoost classifiers. Performance evaluation based on the average values of accuracy and AUC in the CNN-Random Forest model gave the best results of 98.667% and 99.933%, respectively, while the CNN-XGBoost model gave the best results of 98.367% and 99.866, respectively. Then based on the performance comparison, the CNN-Random Forest model provided a better performance in detecting pulmonary tuberculosis compared to the CNN-XGBoost model."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership  Universitas Indonesia Library

Cari yang mirip

Tambahkan ke Favorit

Metadata PDF

Abstrak PDF

Abstrak

Gregorius Vidy Prasetyo

Metode easy ensemble dengan random forest untuk mengatasi masalah klasifikasi pada kelas data tidak seimbang = Easy ensemble with random forest to handle imbalanced data in classification

"ABSTRAK
Pada permasalahan seperti kesehatan atau dunia retail banyak dijumpai data-data yang memiliki kategori yang tidak seimbang. Sebagai contoh jumlah penderita penyakit tertentu relatif langka pada suatu studi atau jumlah transaksi yang terkadang merupakan transaksi palsu (fraud) jumlahnya secara signifikan lebih sedikit ketimbang transaksi normal. Kondisi ini biasa disebut sebagai kondisi data tidak seimbang dan menyebabkan permasalahan pada performa model, terutama pada kelas minoritas. Beberapa metode telah dikembangkan untuk mengatasi permasalahan data tidak seimbang, salah satu metode terkini untuk menanganinya adalah Easy Ensemble. Easy Ensemble diklaim dapat mengatasi efek negatif dari pendekatan konvensional seperti random-under sampling dan mampu meningkatkan performa model dalam memprediksi kelas minoritas. Skripsi ini membahas metode Easy Ensemble dan penerapannya dengan model Random Forest dalam mengatasi masalah data tidak seimbang. Dua buah studi empiris dilakukan berdasarkan kasus nyata dari situs kompetisi hacks.id dan kaggle.com. Proporsi kategori antara kelas mayoritas dan minoritas pada dua data di kasus ini adalah 70:30 dan 94:6. Hasil penelitian menunjukkan bahwa metode Easy Ensemble, dapat meningkatkan performa model klasifikasi Random Forest terhadap kelas minoritas dengan signifikan. Sebelum dilakukan resampling pada data (nhacks.id), nilairecall minority hanya sebesar 0.47, sedangkan setelah dilakukan resampling, nilainya naik menjadi 0.82. Begitu pula pada data kedua (kaggle.com), sebelum resampling nilai recall minority hanya sebesar 0.14, sedangkan setelah dilakukan resampling, nilai naik secara signifikan menjadi 0.71.
ABSTRACT
In the real world problem, there is a lot case of imbalanced data. As an example in medical case, total patients who suffering from cancer is much less than healthy patients. These condition might cause some issues in problem definition level, algorithm level, and data level. Some of the methods have been developed to overcome this issues, one of state-of-the-art method is Easy Ensemble. Easy Ensemble was claimed can improve model performance to classify minority class moreover can overcome the deï¬?ciency of random under-sampling. In this thesis discussed the implementation of Easy Ensemble with Random Forest Classifers to handle imbalance problem in a credit scoring case. This combination method is implemented in two datasets which taken from data science competition website, nhacks.id and kaggle.com with class proportion within majority and minority is 70:30 and 94:6. The results show that resampling with Easy Ensemble can improve Random Forest classifier performance upon minority class. This been shown by value of recall on minority before and after resampling which increasing significantly. Before resampling on the first dataset (nhacks.id), value of recall on minority is just 0.49, but then after resampling, the value of recall on minority is increasing to 0.82. Same with the second dataset (kaggle.com), before the resampling, value of recall on minority is just 0.14, but then after resampling, the value of recall on minority is increasing significantly to 0.71."

2019

S-Pdf

UI - Skripsi Membership  Universitas Indonesia Library

Cari yang mirip

Tambahkan ke Favorit

Metadata PDF

Abstrak PDF

Abstrak

Hamidah

Klasifikasi Data Stroke Menggunakan Minimally Spanned Support Vector Machine = Stroke Data Classification Using Minimally Spanned Support Vector Machine

"
Klasifikasi stroke merupakan masalah yang harus diselesaikan dengan cepat dan tepat untuk menentukan pengobatan awal yang tepat bagi penderita stroke. Jika pengobatan awal yang tepat terlambat untuk dilakukan, maka hal ini dapat menyebabkan kecacatan bahkan kematian. Penelitian ini menyelesaikan masalah klasifikasi stroke menggunakan pendekatan machine learning dengan metode Minimally Spanned Support Vector Machine (MSSVM). Metode ini merupakan pengembangan dari metode Support Vector Machine (SVM) dimana metode ini mengaplikasikan algoritma Minimum Spanning Tree (MST) untuk mereduksi jumlah support vector pada SVM. Hal ini bertujuan untuk mempercepat waktu komputasi yang dibutuhkan oleh SVM dan meningkatkan kinerja SVM. Hal ini dikarenakan waktu komputasi yang dibutuhkan oleh SVM bergantung pada jumlah support vector dimana jumlah support vector yang semakin banyak memberikan waktu komputasi yang dibutuhkan semakin lama. Selain itu, pereduksian jumlah support vector dapat memberikan kesalahan generalisasi yang lebih kecil sehingga memberikan kinerja yang lebih baik. Pada penelitian ini, kinerja dari MSSVM dievaluasi dengan membandingkan beberapa parameter dengan kinerja SVM. Hasil yang diperoleh adalah bahwa MSSVM berhasil mereduksi jumlah support vector pada SVM sedemikian sehingga mempercepat waktu komputasi yang dibutuhkan oleh SVM dalam mengklasifikasikan data stroke tanpa mengurangi kinerja dari SVM.
Stroke classification is a problem that must be solved quickly and precisely to determine the right initial treatment for stroke sufferers. If the right initial treatment is too late to do so, this can cause disability and even death. This study solves the problem of stroke classification using a machine learning approach with Minimally Spanned Support Vector Machine (MSSVM) method. This method is a development of Support Vector Machine (SVM) method where this method applies the Minimum Spanning Tree (MST) algorithm to reduce the number of support vectors in SVM. This aims to speed up the computation time required by SVM and improve the performance of SVM. This is because the computation time required by SVM depends on the number of support vectors where the more support vectors give the required computation time longer. In addition, reducing the number of support vectors can provide smaller generalization errors, thus providing better performance. In this study, the performance of MSSVM was evaluated by comparing several parameters with the performance of SVM. The results obtained are that MSSVM has succeeded in reducing the number of support vectors in SVM thus accelerating the computational time needed by SVM in classifying stroke data without reducing SVM performance.
"

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2020

S-Pdf

UI - Skripsi Membership  Universitas Indonesia Library

Cari yang mirip

Tambahkan ke Favorit

Metadata PDF

Abstrak PDF

Abstrak

<< 1 2 3 4 5 6 7 8 9 10 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian