Hasil Pencarian

Ditemukan 27 dokumen yang sesuai dengan query

Nicholas Ramos Richardo

Analisis Performa EFCM dengan BERT sebagai Representasi Teks pada Pendeteksian Topik = The Performance of EFCM with BERT as Text Representation on Topic Detection

"Pendeteksian topik adalah suatu proses untuk menentukan suatu topik dalam teks dengan menganalisis kata di dalam teks tersebut. Pendeteksian topik dapat dilakukan dengan membaca isi dari teks tersebut. Namun, cara ini semakin sulit apabila data yang dimiliki semakin besar. Memanfaatkan metode machine learning dapat menjadi alternatif dalam menangani data yang berjumlah besar. Metode clustering adalah metode pengelompokkan data yang mirip dari suatu kumpulan data. Beberapa contoh metode clustering adalah K-Means, Fuzzy C-Means (FCM), dan Eigenspaced-Based Fuzzy C-Means (EFCM). EFCM adalah metode clustering yang memanfaatkan metode reduksi dimensi Truncated Singular Value Decomposition (TSVD) dengan metode FCM (Murfi, 2018). Dalam pendeteksian topik, teks harus direpresentasikan kedalam bentuk vektor numerik karena model clustering tidak dapat memproses data yang berbetuk teks. Metode yang sebelumnya umum digunakan adalah Term-Frequency Inversed Document Frequency (TFIDF). Pada tahun 2018 diperkenalkan suatu metode baru yaitu metode Bidirectional Encoder Representations from Transformers (BERT). BERT merupakan pretrained language model yang dikembangkan oleh Google. Penelitian ini akan menggunakan model BERT dan metode clutering EFCM untuk masalah pendeteksian topik. Kinerja performa model dievaluasi dengan menggunakan metrik evaluasi coherence. Hasil simulasi menunjukkan penentuan topik dengan metode modifikasi TFIDF lebih unggul dibandingkan dengan metode centroid-based dengan dua dari tiga dataset yang digunakan metode modifikasi TFIDF memiliki nilai coherence yang lebih besar. Selain itu, BERT lebih unggul dibandingkan dengan metode TFIDF dengan nilai coherence BERT pada ketiga dataset lebih besar dibandingkan dengan nilai coherence TFIDF.

Topic detection is a process to determine a topic in the text by analyzing the words in the text. Topic detection can be done with reading the contents of the text.However, this method is more difficult when bigger data is implemented. Utilizing machine learning methods can be an alternative approach for handling a large amount of data. The clustering method is a method for grouping similar data from a data set. Some examples of clustering methods are K-Means, Fuzzy C-Means (FCM), and Eigenspaced-Based Fuzzy C-Means (EFCM). EFCM is a clustering method that utilizes the truncated dimension reduction method Singular Value Decomposition (TSVD) with the FCM method (Murfi, 2018). In topic detection, the text must be represented in numerical vector form because the clustering model cannot process data in the form of text. The previous method that was most commonly used is the Term-Frequency Inverse Document Frequency (TFIDF). In 2018 a new method was introduced, namely the Bidirectional Encoder method Representations from Transformers (BERT). BERT is a pretrained language model developed by Google. This study will use the BERT model and the EFCM clustering method for topic detection problems. The performance of the model is evaluated using the coherence evaluation metric. The simulation results show that modified TFIDF method for topic determination is superior to the centroid-based method with two of the three datasets used by modified TFIDF method having a greater coherence value. In addition, BERT is superior to the TFIDF method with the BERT coherence value in the three datasets greater than the TFIDF coherence value."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Warren, Hans

Steen der hulp / Hans Warren

Amsterdam: Bert Bakker, 1990

BLD 839.36 WAR st

Buku Teks Universitas Indonesia Library

Carlo Johan Nikanor

Analisis Pengaruh Urutan Domain terhadap Lifelong Learning Berbasis BERT = An Analysis on the Effects of Sequences of Domains for BERT-Based Lifelong Learning

"Perkembangan pesat teknologi telah memberikan akses kepada masyarakat untuk mengemukakan opini dan evaluasi pribadi di media sosial dan berbagai penjuru dunia digital. Hal ini menjadi pemicu berkembangnya ilmu analisis sentimen atau sering disebut juga opinion mining yang merupakan pengaplikasian dari ilmu machine learning. Umumnya, metode machine learning mempelajari satu domain untuk menghasilkan suatu model, tetapi dengan pengembangan lanjut dihasilkan lifelong learning dimana pembelajaran model berlangsung secara kontinu menggunakan berbagai source domain. Pada tahun 2022, Osmardifa melakukan penelitan mengenai perbandingan kinerja model Bidirectional Encoding Representation from Transformers (BERT) terhadap kinerja model Convolutional Neural Network (CNN) dan model Long Short-Term Memory (LSTM) untuk lifelong learning. Namun, dari perbandingan kinerja tersebut hanya menggunakan satu kombinasi urutan domain dari total 120 kombinasi dari urutan 5 source domain. Dalam skripsi ini, kombinasi semua kombinasi urutan source domain menggunakan dataset penelitian Osmardifa disimulasikan untuk mengukur kinerja model menggunakan urutan pembelajaran yang berbeda dari simulasi yang dijalankan Osmardifa. Hasil simulasi urutan source domain lainnya menggunakan metode BERT menunjukkan banyak kombinasi urutan source domain yang menghasilkan kinerja lebih baik dibandingkan penelitian sebelumnya. Didapat bahwa urutan pembelajaran Capres – Jenius – Shopback – Ecom- Grab menghasilkan akurasi tertinggi 82,49% untuk retain of knowledge bagi source domain yang menggunakan dataset Capres sebagai Source Domain 1 dan urutan Capres – Jenius – Grab – Ecom – Shopback menghasilkan akurasi tertinggi 91,32% untuk transfer of knowledge. Hasil ini menunjukkan kenaikan sebesar 1,53% dan 1,72% dibandingkan simulasi awal yang dilakukan oleh Osmardifa. Analisis lanjut dilaksanakan untuk melihat apakah ada pola atau alasan yang dapat menjelaskan perbedaan kinerja pada model ketika urutan source domain digantikan akan tetapi tidak ditemukan pola atau atau alasan tersebut tidak ditemukan pada penelitian.

Technological advancements have given the public more of an opportunity to share opinions and personal evaluations within public spaces through social media and other domains on the internet.This phenomenon sparked an interest to develop a field of study under machine learning called opinion mining which specializes in analyzing sentiments found within texts. Generally, machine learning models have one domain or dataset which is used to develop the model, however with further developments a lifelong learning was developed which aims to develop models through continual learning with multiple domains or datasets. In 2022, Osmardifa underwent a study to compare the results of the Bidirectional Encoding Representations from Transfomers (BERT) model with the Convolutional Neural Network (CNN) model and the Long Short-Term Memory (LSTM) model when all of the above are used for lifelong learning. However, the comparison that was used within the study only used one combination of the sequence of source domains available using 5 source domains when there are in fact 120 possible sequences of source domains when using 5 source domains. Therefore, this study aims to further analyze the accuracy of the model in Osmardifa’s research when tested and trained using the other 120 possible learning orders of the model. Further simulations on the previously unused sequences using the BERT model showed better results than the sequence of source domains that was used in previous studies. The Capres – Jenius – Shopback – Ecom- Grab sequence showed the best resulting accuracy for the retain of knowledge tests which used the Capres dataset as the first source domain (Source Domain 1), said sequence of source domains had a final accuracy of 82.49% which is a 1.53% increase compared to previous results. The transfer of knowledge tests also showed that the Capres – Jenius – Grab – Ecom – Shopback sequence gave the best overall results with a final accuracy of 91.32% which is an increase of 1.72% compared to the previous study. Further analysis on the results of the simulations were done to check whether or not there was an underlying pattern or reason for this difference in accuracy, however no conclusive pattern or reasons were found."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Muhammad Anwar Farihin

Pengenalan Entitas Bernama pada Twit Berbahasa Indonesia Menggunakan Model Pre-Trained BERT = BERT Pre-Trained Language Model for Named Entity Recognition on Indonesian Tweets

"Pengenalan Entitas Bernama (NER) telah diteliti cukup dalam, khususnya pada korpus berbahasa Inggris. Namun, penelitian NER pada korpus twit berbahasa Indonesia masih sangat sedikit karena minimnya dataset yang tersedia secara publik. BERT sebagai salah satu model state-of-the-art pada permasalahan NER belum diimplementasikan pada korpus twit berbahasa Indonesia. Kontribusi kami pada penelitian ini adalah mengembangkan dataset NER baru pada korpus twit berbahasa Indonesia sebanyak 7.426 twit, serta melakukan eksperimen pada model CRF dan BERT pada dataset tersebut. Pada akhirnya, model terbaik pada penelitian ini menghasilkan nilai F1 72,35% pada evaluasi tingkat token, serta nilai F1 79,27% (partial match) dan 75,40% (exact match) pada evaluasi tingkat entitas.

Named Entity Recognition (NER) has been extensively researched, primarily for understanding the English corpus. However, there has been very little NER research for understanding Indonesian-language tweet corpus due to the lack of publicly available datasets. As one of the state-of-the-art models in NER, BERT has not yet been implemented in the Indonesian-language tweet corpus. Our contribution to this research is to develop a new NER dataset on the corpus of 7.426 Indonesian-language tweets and to conduct experiments on the CRF and BERT models on the dataset. In the end, the best model of this research resulted in an F1 score of 72,35% at the token level evaluation and an F1 score of 79,27% (partial match) and 75,40% (exact match) at the entity level evaluation."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2021

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Gilang Setyawan Yoga Pratama

Rancang Bangun Model Normalisasi Alamat di Indonesia Berbasis CNN dan BERT = Design of Address Normalization Model in Indonesia Based on CNN and BERT

"Alamat adalah informasi yang digunakan untuk menunjukkan lokasi suatu tempat. Didalamnya terdapat beberapa komponen seperti nama jalan, nomor rumah nomor rumah, RT/RW, kelurahan, kecamatan, kota/kabupaten, provinsi, dan kode pos. Fungsi alamat sebagai identitas goegrafis suatu tempat yang digunakan sebagai komunikasi, pengiriman barang, administasi dan kepentingan layanan lainnya. Normalisasi alamat merupakan proses yang dilakukan untuk mencapai suatu keseragaman dan akurasi komponen yang ada didalamnya. Dataset akan dibuat sendiri menggunakn teknik web scraping yang akan mengumpulkan alamat dengan bantuan Google Maps. Lalu alamat akan dilakukan praproses sebelum digunakan untuk pelatihan model. Dataset akan dibagi menjadi data train dan data test yang akan digunakan untuk pelatihan dan pengujian model. Penelitian ini berfokus pada pengembangan model machine learning dengan teknik convolutional neural network (CNN) dan bidirectional encoder representation from transformer (BERT). Hasil nantinya akan evaluasi berdasarkan accuracy, precision, recall, dan F1-score. Setelah mendapat model terbaik akan dilanjutkan dengan pengujian pada data test dan pengujian manual melalui terminal. Pengguna dapat mengisi alamat langsung lalu akan diberikan output alamat yang teah di lakukan standrisasi. Solusi yang dikembangkan terbagi menjadi 3 model yaitu model CNN, BERT, dan kombinasi CNN + BERT. Berdasarkan hasil peneletian, Model CNN mendapat hasil akurasi sebesar 89%, BERT mendapat hasi akurasi sebesar 23%, dan kombinasi CNN +BERT mendapat hasil akurasi sebesar 27%. Dengan ini model terbaik yaitu CNN akan dipilih untuk masuk ke pengujian menggunakan data test dan pengujian secara manual di terminal.

Address is information used to indicate the location of a place. It contains several components such as street name, house number, RT/RW, sub-district, district, city/regency, province, and postal code. The function of the address as a geographic identity of a place used for communication, delivery of goods, administration and other service interests. Address normalization is a process carried out to achieve uniformity and accuracy of the components contained therein. The dataset will be created independently using web scraping techniques that will collect addresses with the help of Google Maps. Then the address will be preprocessed before being used for model training. The dataset will be divided into train data and test data which will be used for training and testing the model. This research focuses on the development of a machine learning model with convolutional neural network (CNN) and bidirectional encoder representation from transformer (BERT) techniques. The results will later be evaluated based on accuracy, precision, recall, and F1-score. After getting the best model, it will be continued with testing on test data and manual testing via the terminal. Users can fill in the address directly and then will be given the address output that has been standardized. The developed solutions are divided into 3 models, namely the CNN model, BERT, and a combination of CNN + BERT. Based on the research results, the CNN model got an accuracy of 89%, BERT got an accuracy of 23%, and the combination of CNN + BERT got an accuracy of 27%. With this, the best model, namely CNN, will be selected to enter the test using test data and manual testing at the terminal."

Depok: Fakultas Teknik Universitas Indonesia, 2025

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Maulana Nurhendronoto

Klasifikasi Emosi Berbasis Teks Bahasa Indonesia dengan Perbandingan CNN, LSTM dan BERT = Indonesian Text Based Emotion Classification with Comparison of CNN, LSTM and BERT

"Emosi adalah perasaan yang muncul dalam diri seseorang sebagai respon dari situasi tertentu. Perasan ini dapat memengaruhi pikiran, perilaku, dan persepsi seseorang terhadap suatu peristiwa. Klasifikasi emosi adalah bagian dari analisis sentimen yang bertujuan untuk menganalisis dan memperoleh emosi dari suatu data. Penelitian klasifikasi emosi berbasis teks perlu dilakukan karena dapat diimplementasikan pada berbagai bidang, seperti kesehatan dan pendidikan. Bahasa Indonesia menduduki peringkat 11 bahasa dengan penutur terbanyak di dunia dengan 200 juta penutur. Namun, penelitian klasifikasi emosi berbasis teks bahasa Indonesia masih sedikit dilakukan. Algoritma machine learning dapat digunakan untuk mengatasi berbagai tantangan dalam penelitian klasifikasi emosi seperti memahami emosi dan menganalisis emosi dari data yang tidak terstruktur. Penelitian ini berfokus pada pengembangan model machine learning dengan teknik convolutional neural network (CNN), long short-term memory (LSTM), dan bidirectional encoder representation from transformer (BERT). Berdasarkan pengujian yang dilakukan, metode convolutional neural network (CNN) mendapatkan F1 score sebesar 84,2%, metode long short term memory mendapatkan F1 score sebesar 82%, metode BERT en uncased mendapatkan F1 score sebesar 22%, dan metode BERT multi cased mendapatkan F1 score sebesar 32%. Hasil pengujian ini menandakan metode CNN merupakan metode dengan hasil pengujian terbaik dan BERT en uncased merupakan metode dengan hasil pengujian terburuk dibanding ketiga metode lainnya.

Emotions are feelings that arise within a person in response to a particular situation. These feelings can affect a person's thoughts, behavior, and perception of an event. Emotion classification is a part of sentiment analysis that aims to analyze and derive emotions from data. Text-based emotion classification research needs to be done because it can be implemented in various fields, such as health and education. Indonesian is ranked the 11th most spoken language in the world with 200 million speakers. However, there is still little research on Indonesian text-based emotion classification. Machine learning algorithms can be used to overcome various challenges in emotion classification research such as understanding emotions and analyzing emotions from unstructured data. This research focuses on developing machine learning models with convolutional neural network (CNN), long short-term memory (LSTM), and bidirectional encoder representation from transformer (BERT) techniques. Based on the tests conducted, the convolutional neural network (CNN) method gets an F1 score of 84,2%, the long short term memroy method gets an F1 score of 82%, the BERT en uncased method gets an F1 score of 22%, and the BERT multi cased method gets an F1 score of 32%. These results indicate that the CNN is the bets method while the BERT en uncased is the worst method compared to the three other methods."

Depok: Fakultas Teknik Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Dwi Guna Mandhasiya

Hibrida Model BERT dan Deep Learning untuk Analisis Sentimen Bahasa Indonesia - Studi Kasus Sentimen Pengguna Media Sosial terkait Pilpres 2024 = The Hybrid of BERT and Deep Learning Models for Indonesian Sentiment Analysis - A Case Study of Social Media User Sentiments Regarding the 2024 Indonesia Presidential Election

"Ilmu Data adalah irisan dari matematika dan statistika, komputer, serta keahlian domain. Dalam beberapa tahun terakhir inovasi pada bidang ilmu data berkembang sangat pesat, seperti Artificial Intelligence (AI) yang telah banyak membantu kehidupan manusia. Deep Learning (DL) sebagai bagian dari AI merupakan pengembangan dari salah satu model machine learning yaitu neural network. Dengan banyaknya jumlah lapisan neural network, model deep learning mampu melakukan proses ekstrasi fitur dan klasifikasi dalam satu arsitektur. Model ini telah terbukti mengungguli teknik state-of-the-art machine learning di beberapa bidang seperti pengenalan pola, suara, citra, dan klasifikasi teks. Model deep learning telah melampaui pendekatan berbasis AI dalam berbagai tugas klasifikasi teks, termasuk analisis sentimen. Data teks dapat berasal dari berbagai sumber, seperti sumber dari media sosial. Analisis sentimen atau opinion mining merupakan salah satu studi komputasi yang menganalisis opini dan emosi yang diekspresikan pada teks. Pada penelitian ini analisis peforma machine learning dilakukan pada metode deep learning berbasis representasi data BERT dengan metode CNN dan LSTM serta metode hybrid deep learning CNN-LSTM dan LSTM-CNN. Implementasi model menggunakan data komentar youtube pada video politik dengan topik terkait Pilpres 2024, kemudian evaluasi peforma dilakukan menggunakan confusion metric berupa akurasi, presisi, dan recall.

Data Science is the intersection of mathematics and statistics, computing, and a domain of expertise. In recent years innovation in the field of data science has developed very rapidly, such as Artificial Intelligence (AI) which helped a lot in human life. Deep Learning (DL) as part of AI is the development of one of the machine learning models, namely neural network. With the large number of neural network layers, deep learning models are capable of performing feature extraction and classification processes in a single architecture. This model has proven to outperform state-of-the-art machine learning techniques in areas such as pattern recognition, speech, imagery, and text classification. Deep learning models have gone beyond AI-based approaches in a variety of text classification task, including sentiment analysis. Text data can come from various sources, such as source from social media. Sentiment analysis or opinion mining is a computational study that analyze opinions and emotions expressed in text. In this research, machine learning performance analysis is carried out on a deep learning method based on BERT data representation with the CNN and LSTM and hybrid deep learning CNN-LSTM and LSTM-CNN method. The implementation of the model uses YouTube commentary data on political videos related to the 2024 Indonesia presidential election, then performance analysis is carried out using confusion metrics in the form of accuracy, precision, and recall."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Zihan Nindia

Analisa Long-Short Term Memory dan BERT Embeddings pada Klasifikasi Teks Data SMS Spam Berbahasa Indonesia = Analysis of Long-Short Term Memory and BERT Embeddings on Text Classification of SMS Spam Data in Indonesian

"Pesatnya perkembangan teknologi informasi dan komunikasi telah membawa banyak perubahan dalam kehidupan manusia. Salah satu perkembangan yang paling signifikan adalah munculnya teknologi pesan singkat atau Short Message Service (SMS). Media SMS sering disalahgunakan sebagai media penipuan terhadap pengguna telepon. Penipuan sering terjadi dengan cara mengirimkan SMS secara masif dan acak hingga mencapai sepuluh ribu per hari kepada semua pengguna dan menjadi SMS spam bagi banyak orang. Klasifikasi teks menggunakan Long-Short Term Memory (LSTM) dan BERT Embbeddings dilakukan untuk mengklasifikasi data SMS ke dalam dua kategori, yaitu spam dan non-spam. Data terdiri dari 5575 SMS yang telah diberi label. Dengan menggunakan metode LSTM + BERT, penelitian ini dapat mencapai nilai accuracy sebesar 97.85%. Metode ini menghasilkan hasil yang lebih baik dari ketiga model sebelumnya. Model LSTM + BERT menghasilkan nilai accuracy 0.65% lebih baik dari LSTM.

The rapid development of information and communication technology has brought many changes in human life. One of the most significant developments is the emergence of short message service (SMS) technology. SMS media is often misused as a medium for fraud against telephone users. Fraud often occurs by sending massive and random SMS up to ten thousand per day to all users and becomes SMS spam for many people. Text classification using Long-Short Term Memory (LSTM) and BERT Embeddings is performed to classify SMS data into two categories, namely spam and ham. The data consists of 5575 SMS that have been labeled. By using the LSTM + BERT method, this research can achieve an accuracy value of 97.85%. This method produces better results than the three previous models. The LSTM + BERT model produces an accuracy value of 0.65% better than LSTM."

Depok: Fakultas Teknik Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Agung Firmansyah

Pengembangan Sistem Penilaian Esai Otomatis (SIMPLE-O) Bahasa Indonesia Menggunakan BERT (Bidirectional Encoder Representations from Transformers) dan BILSTM (Bidirectional Long-Short Term Memory) = The Development of an Automatic Essay Scoring System (SIMPLE-O) for Indonesia Using BERT (Bidirectional Encoder Representations from Transformers) and BILSTM (Bidirectional Long-Short Term Memory)

"Jurnal ini membahas tentang pengembangan Sistem Penilaian Esai Otomatis (SIMPLE-O) untuk Bahasa Indonesia menggunakan BERT (Bidirectional Encoder Representations from Transformers) dan Bidirectional LSTM. BERT digunakan untuk melakukan sentence embedding pada jawaban mahasiswa dan dosen, yang kemudian diproses oleh Bidirectional LSTM. Kemiripan antara jawaban diukur dengan menggunakan Manhattan Distance dan Cosine Similarity. Hasil pengujian menunjukkan bahwa rata-rata selisih absolut antara nilai model dengan nilai human rater adalah 22.83 dengan error MAE dan RMSE sebesar 0.2462 dan 0.2850 untuk Manhattan Distance, dan 12.88 dengan error MAE dan RMSE sebesar 0.1614 dan 0.1946 untuk Cosine Similarity.

This paper presents the development of an Automatic Essay Scoring System (SIMPLE-O) for the Indonesian using BERT (Bidirectional Encoder Representations from Transformers) and Bidirectional LSTM. BERT is used to perform sentence embedding on both student and lecturer answers, which are then processed by Bidirectional LSTM. The similarity between the answers is measured using Manhattan Distance and Cosine Similarity. The test results show that the average absolute difference between the model score and the human rater score is 22.83 with MAE and RMSE error of 0.2462 and 0.2850 for Manhattan Distance, and 12.88 with MAE and RMSE error of 0.1614 and 0.1946 for Cosine Similarity."

Depok: Fakultas Teknik Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Deandra Setyaputri

Rancang Bangun Forum Diskusi Online untuk E-Learning dengan Opsi Partisipasi Anonim dan Sistem Moderasi Otomatis Berbasis Bidirectional Encoder Representations from Transformers (BERT) = Development of an Online Discussion Forum for E-Learning with an Anonymous Participation Option and Automatic Moderation System Based On Bidirectional Encoder Representations from Transformers (BERT)

"Dalam pendidikan, partisipasi pelajar dalam kelas dapat menjadi salah satu faktor pendukung proses pembelajaran yang efektif. Demi mendukung partisipasi pelajar, penelitian ini bertujuan untuk mengembangkan sebuah forum diskusi online untuk proses e-learning dengan mengimplementasikan fitur partisipasi anonim dimana pelajar dapat mengunggah post tanpa harus menunjukkan identitas aslinya. Pilihan untuk dapat berpartisipasi secara anonim mampu meningkatkan keinginan pelajar untuk berpartisipasi dalam pembelajaran seperti melalui aksi bertanya, menjawab pertanyaan, dan berpendapat dalam kelas. Namun anonimitas yang ditawarkan dapat mengundang perilaku buruk karena berkurangnya akuntabilitas. Untuk mengatasinya, penelitian ini juga bertujuan untuk mengembangkan sistem moderasi otomatis pada forum diskusi dengan memanfaatkan model deep learning pendeteksi bahasa kasar berbasis Bidirectional Encoder Representations from Transformers atau BERT. Setiap kali pengguna ingin mengirim unggahan ke dalam forum diskusi, model pendeteksi bahasa kasar akan terlebih dahulu mengklasifikasikan teks unggahan tersebut ke dalam kelas ‘abusive’ jika terdapat unsur kasar, menyinggung, atau mengandung kebencian dan ke dalam kelas ‘safe’ jika tidak. Sistem akan secara otomatis mencegah suatu unggahan untuk terkirim jika unggahan tersebut diklasifikasikan sebagai ‘abusive’. Model pendeteksi bahasa kasar tersebut dilatih dengan melakukan fine-tuning pada IndoBERT, model pre-trained Bahasa Indonesia berbasis BERT, dan IndoBERTweet yang dilatih untuk domain Twitter. Berdasarkan hasil pengujian, model dengan performa terbaik merupakan model hasil fine-tuning IndoBERTweet yang mencapai F1 Score sebesar 91,02%. Durasi waktu yang dibutuhkan oleh model untuk mengeksekusi prediksi bervariasi berdasarkan panjang input, dimana durasi bertambah seiring dengan meningkatnya jumlah karakter pada input, namun maksimum berada di kisaran 1,3 detik karena adanya batasan jumlah token input yang dapat diproses model.

In education, students’ in-class participation can be one of the supporting factors for effective learning. In order to promote student participation, this study aims to develop an online discussion forum for e-learning that implements an anonymous participation feature where students can upload posts without having to show their real identities. The choice to be able to participate anonymously has been proven to improve students’ motivation to participate in the learning process through asking and answering questions and expressing opinions in class. But the anonymity offered can be the cause of several bad behaviors due to the lack of accountability. To handle this, this research will also aim to develop an automatic moderation system for the discussion forums that uses an abusive language classifier deep learning model based on Bidirectional Encoder Representations from Transformers or BERT. Every time a user wants to upload a post to the discussion forum, the abusive language detection model will first classify the uploaded text into the ‘abusive’ class if it contains abusive language or hateful content and into the ‘safe’ class if otherwise. The system will automatically prevent a post from being uploaded if it was classified as ‘abusive’. The abusive language classifier model is trained by fine-tuning the IndoBERT model, a pre-trained Bahasa Indonesia model based on BERT, and IndoBERTweet which was trained for the Twitter domain. Based on testing results, the model with the best performance is the fine-tuned IndoBERTweet model which achieved an F1 Score of 91,02%. The duration of time required by the model to execute predictions varies based on the length of the input, where the duration increases as the number of characters in the input increases, but the maximum is around 1.2 seconds due to a limit on the number of input tokens that the model can process."

Depok: Fakultas Teknik Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

<< 1 2 3 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian