Hasil Pencarian

Ditemukan 25 dokumen yang sesuai dengan query

Nabila Khansa

Deteksi Ujaran Kebencian dan Bahasa Kasar pada Blog Mikro Berbahasa Indonesia = Detection of Hate Speech and Abusive Language on Indonesian Microblogs

"Ujaran kebencian dan bahasa kasar mempermudah penyebaran kekerasan di kehidupan nyata, sehingga muncul urgensi adanya pendeteksian secara otomatis. Untuk melanjutkan pekerjaan yang sudah dilakukan oleh Ibrohim dan Budi (2019), penelitian ini membahas dua isu terkait deteksi ujaran kebencian dan bahasa kasar pada mikroblog berbahasa Indonesia. Isu pertama adalah kajian terkait effect size fitur dan pengembangan model menggunakan fitur-fitur tersebut. Metode Analysis of Variance f-test, Logistic Regression Analysis, dan nilai Shapley digunakan untuk melakukan kajian effect size pada fitur-fitur yang dirancang secara manual. Kemudian, digunakan beberapa algoritma pemelajaran mesin untuk mengembangkan model prediksi berbasis fitur-fitur tersebut. Isu kedua adalah kajian bias dalam pengembangan model terkait keberadaan kata-kata bersifat netral pada data yang merupakan ujaran kebencian atau bahasa kasar. Kajian terkait bias dilakukan dengan menggunakan dataset uji bias. Dataset ini dikembangkan dengan menggantikan kata-kata yang dideteksi memiliki potensi adanya bias pada model yang dilatih menggunakan dataset hasil pekerjaan Ibrohim dan Budi (2019). Penelitian ini menunjukkan bahwa keberadaan kata-kata tertentu berpengaruh terhadap hasil deteksi ujaran kebencian dan bahasa kasar. Di antara kata-kata tersebut, terdeteksi beberapa kata-kata yang berpotensi bias, karena memiliki pengaruh terhadap pendeteksian padahal secara sendiri kata-kata yang dideteksi sebagai potensi bias tidak memiliki unsur kebencian atau bersifat kasar. Hasil evaluasi pengambilan sampel bootstrap menunjukkan Logistic Regression dan XGBoost sebagai model dengan akurasi terbaik dalam pendeteksian ujaran kebencian dan bahasa kasar. Namun, ketika model yang sudah dikembangkan digunakan untuk memprediksi dataset sintetis, didapatkan penurunan akurasi dalam pendeteksian ujaran kebencian. Hasil ini menandakan adanya bias pada model yang dikembangkan. Hasil tersebut didukung juga oleh hasil prediksi dengan akurasi rendah ketika model digunakan untuk melakukan pendeteksian ujaran kebencian pada dataset yang dikembangkan secara manual, tetapi ketika kata-kata bias digantikan dari data, akurasi model meningkat. Kontribusi yang diberikan oleh penelitian ini adalah pengembangan dataset uji bias secara otomatis dari dataset yang dikembangkan oleh Ibrohim dan Budi (2019) dan juga dataset uji bias yang dikembangkan secara manual.

Hate speech and abusive language facilitate the spread of violence in real life, hence the urgency of automatic detection. To continue the work done by Ibrohim dan Budi (2019), this research addresses two issues related to the detection of hate speech and abusive language on Indonesian-language microblogs. The first issue is a study on the effect size of features and the development of models using these features. Analysis of Variance f-test, Logistic Regression Analysis, and Shapley values are used to investigate the effect size of manually designed features. Several machine learning algorithms are then employed to develop prediction models based on these features. The second issue involves studying bias in model development concerning the presence of neutral words in data that constitute hate speech or abusive language. The study related to bias is conducted by using a bias test dataset. This dataset is developed by replacing words that are detected to have the potential for bias in models trained using the dataset resulting from the work of Ibrohim dan Budi (2019). This research demonstrates that certain words significantly influence the detection of hate speech and abusive language. Among these words, some are identified as potentially biased, as they affect detection despite not inherently containing hate or abusive elements. The results of bootstrap sampling evaluation indicate that Logistic Regression and XGBoost are the models with the highest accuracy in detecting hate speech and abusive language. However, when the developed models are used to predict synthetic datasets, a significant decrease in accuracy is observed in hate speech detection. This finding indicates the presence of bias in the developed models. This result is further supported by low-accuracy predictions when the models are used to detect hate speech in manually developed datasets. However, when biased words are replaced in the data, the model’s accuracy significantly improves. The contributions of this research include the development of an automatically generated bias test dataset from the dataset created by Ibrohim dan Budi (2019), as well as a manually developed bias test dataset."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Bern Jonathan

Penerapan text mining untuk klasifikasi konten menjual produk di media sosial female daily = The Use of text mining for classification of product selling content inSocial mediaFemale daily

"Female Daily Network perusahaan yang bergerak di bidang media sosial. Female Daily

memiliki media sosial untuk membagikan pengalaman menggunakan produk kecantikan

bernama Female Daily. Female Daily memiliki peraturan untuk tidak menggunakan

Female Daily Platform untuk mempromosikan, menjual produk, dan layanan di platform

media sosial di Female Daily. Namun, pengguna di Female Daily terkadang melanggar

peraturan tersebut di post mereka dan menyebabkan pengguna lain terganggu akan hal

tersebut. Admin di Female Daily kesulitan untuk mengidentifikasi pengguna yang

melanggar aturan itu dan melarang post mereka yang berisi penjualan produk karena

keterbatasan jumlah admin dengan jumlah post yang masuk tiap hari. Text mining juga

dapat mengatasi permasalahan ini dengan menentukan klasifikasi secara otomatis dengan

membuat sistem yang melakukan proses pembelajaran dengan dari kata-kata post yang

tersedia. Algoritme yang bisa digunakan untuk melakukan proses text mining pada

penelitian ini seperti Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree

(DT), dan Random Forest (RF). Penelitian ini menggunakan kombinasi cara ekstraksi

fitur, fitur kontekstual, dan melakukan balancing data. Penelitian ini menggunakan

skenario penelitian untuk menganalisis ekstraksi fitur, penggunaan fitur kontekstual, dan

balancing data. Algoritme terbaik dilihat dari nilai recall pada kombinasi algoritme dan

fitur penelitian ini adalah Random Forest TF-IDF Unigram dan menggunakan tambahan

fitur kontekstual deteksi uang dan kata-kata menjual dengan data yang seimbang. Nilai

recall 88.37% didapatkan dari hasil kombinasi algoritme dan fitur tersebut.

Female Daily Network is a company engaged in social media. Female Daily has social
media to share experiences using beauty products called Female Daily. Female Daily has
regulations not to use the Female Daily Platform to promote, sell products and services
on social media platforms in Female Daily. However, users on Female Daily sometimes
violate these rules in their posts and cause other users to be annoyed about it. Admins at
Female Daily have difficulty identifying users who violate these rules and ban their posts
containing product sales due to the limited number of admins with the number of posts
that enter each day. Text mining can also overcome this problem by determining the
classification automatically by creating a system that carries out the learning process
from the available post words. Algorithms that can be used to carry out the text mining
process in this research are Support Vector Machine (SVM), Naïve Bayes (NB), Decision
Tree (DT), and Random Forest (RF). This study uses a combination of feature extraction,
contextual features, and data balancing. This study uses research scenarios to analyze
feature extraction, contextual feature usage, and data balancing. The best algorithm seen
from the recall value in the combination of algorithms and features of this research is the
Random Forest TF-IDF Unigram and uses additional contextual features to detect money
and selling words with balanced data. The recall value of 88.37% is obtained from the
results of the combination of these algorithms and features."

Depok: Fakultas Ilmu Komputer Universitas Indonesia , 2021

T-Pdf

UI - Tesis Membership Universitas Indonesia Library

Deden Ade Nurdeni

Penggalian informasi untuk identifikasi permintaan bantuan korban bencana alam menggunakan data twitter = Extracting information to identify assistance for natural disaster victims using twitter data.

"Kajian risiko bencana di Indonesia oleh BNPB menunjukkan jumlah jiwa terpapar risiko bencana tersebar di seluruh Indonesia dengan total potensi jiwa terpapar lebih dari 255 juta jiwa. Hasil kajian ini menunjukkan bahwa dampak bencana di Indonesia terbilang sangat tinggi. Sistem penanggulangan khususnya pada masa tanggap darurat menjadi hal yang krusial untuk dapat meminimalisir risiko. Namun, pemberian bantuan kepada korban bencana terkendala beberapa hal, antara lain keterlambatan dalam penyaluran, kurangnya informasi lokasi korban, dan distribusi bantuan yang tidak merata. Untuk memberikan informasi yang cepat dan tepat, BNPB telah membangun beberapa sistem informasi seperti DIBI, InAware, Geospasial, Petabencana.id dan InaRisk. Akan tetapi tidak secara realtime menampilkan wilayah terdampak bencana dengan memnunjukkan jenis kebutuhan bantuan apa yang dibutuhkan korban pada saat itu. Untuk memberikan solusi atas permasalah tersebut, penelitian ini membangun model yang mampu mengklasifikasikan data teks dari Twitter terkait bencana kedalam jenis bantuan yang diminta oleh korban bencana secara realtime. Selain itu visualisasi berupa dashboard dibangun dalam bentuk aplikasi berbasis peta untuk menampilkan lokasi korban yang terdampak. Penelitian ini mengunakan teknik text mining mengolah data Twitter dengan pendekatan metode klasifikasi multi label dan ekstraksi informasi lokasi menggunakan metode Stanford NER. Algoritme yang digunakan adalan Naive Bayes, Support Vector Machine, dan Logistic Regression dengan kombinasi metode tranformasi data multi label OneVsRest, Binary Relevance, Label Power-set, dan Classifier Chain. Representasi teks menggunakan N-Grams dengan pembobotan TF-IDF. Model terbaik untuk klasifikasi multi label pada penelitian ini adalah kombinasi Support Vector Machine dan Clasifier Chain dengan fitur UniGram+BiGram dengan nilai precision 82%, recall 70%, dan F1-score 75%. Stanford NER menghasilkan F1-score 83% untuk klasifikasi lokasi yang menjadi masukan untuk teknik geocoding. Hasil geocoding berupa informasi spasial ditampilkan dalam bentuk dashboard berbasis peta.

The study of disaster risk in Indonesia by BNPB shows the number of people exposed to disaster risk throughout Indonesia with a total potential life of 255 million people. The results of this study indicate that the impact of disasters in Indonesia is quite high. The response system, especially during the emergency response period, is crucial to be able to minimize risks. However, providing assistance to disaster victims is hampered by several things, including delays in providing assistance, lack of information on the location of victims, and uneven distribution of aid. To provide fast and accurate information, BNPB has built several information systems such as DIBI, InAware, Geospatial, Petabencana.id and InaRisk. However, it does not display the disaster area in real-time by showing what kind of assistance needs the victim needs at that time. To provide a solution to these problems, this study builds a model that is able to classify text data from Twitter related to the type of assistance requested by disaster victims in real-time. In addition, a dashboard is built in the form of a map-based application to display the location of the realized victim. This study uses text mining techniques to process Twitter data with a multi-label classification approach and location information extraction using the Stanford NER method. The algorithms used are Naive Bayes, Support Vector Machine, and Logistic Regression with a combination of OneVsRest, Binary Relevance, Power-set Label, and Classifier Chain. Text representation using N-Grams with TF-IDF weighting. The best model for multi-label classification in this study is a combination of Support Vector Machine and Classifier Chain with UniGram+BiGram features with 82% precision, 70% recall, and 75% F1-score. Stanford NER produces an F1-score of 83% for location classification which is the input for geocoding techniques. Geocoding results in the form of spatial information are displayed in a map-based dashboard."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2021

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Rahmatul Mahdalina

Analisis Kepuasan Pelanggan E-commerce pada Microblogs = E-commerce Customer Satisfaction Analysis on Microblogs

"Teknologi mempengaruhi berbagai lini kehidupan masyarakat. Penggunaan transportasi, memenuhi kebutuhan seperti belanja, bahkan juga untuk bersosialisasi semuanya menggunakan teknologi. Aktivitas seperti jual beli juga pun terpengaruh, hingga mendorong munculnya e-commerce. E-commerce tentu berkaitan dengan pelanggan maupun pengguna e-commerce yang mengungkapkan berbagai opininya. Beberapa opini disampaikan melalui media sosial yang dimiliki oleh e-commerce salah satunya adalah Twitter. Opini inilah yang menarik dieksplorasi untuk diketahui sentimen dan tingkat kepuasan dari pelanggan e-commerce. Oleh karenanya, penelitian ini bertujuan untuk menyusun aspek kepuasan pelanggan dan juga menentukan metode Lexicon yang relevan untuk analisis sentimen. Data diambil dari microblogs Twitter terbatas pada 5 penyelenggara e-commerce yaitu Blibli, Bukalapak, Lazada, Shopee, dan Tokopedia sebanyak 88.816 tweet. Dikategorisasikan ke dalam aspek-aspek sehingga menjadi 12.995 tweet. Aspek-aspek kepuasan pelanggan disusun melalui studi literatur dan menghasilkan 6 aspek. Keenam aspek tersebut adalah aspek produk, penjual, logistik, harga, layanan, dan sistem. Kategorisasi tweet ke dalam aspek menggunakan kata kunci berkaitan dengan aspek-aspek sebanyak 73 kata kunci yang dihasilkan dari analisis hasil word cloud dan topic modelling yang menggunakan LDA. Untuk pemilihan metode lexicon dibuat skenario yang diujikan pada 300 tweet yang dilabeli secara manual dan dipilih secara acak dari data aspek. Skenario yang dilakukan pada data berlabel ada dua, yaitu menggunakan Lexicon 1 dan Lexicon 2. Lexicon 1 adalah perbandingan kamus, sedangkan Lexicon 2 merupakan perbandingan rumus yang berbeda. Hasil Lexicon 1 adalah seluruh kamus memiliki nilai akurasi sama yaitu 0,54. Sedangkan pada Lexicon 2 memiliki nilai akurasi tertinggi yaitu 0,46 dari kamus berskala 1 dengan rumus pertama. Sehingga, metode terpilih adalah Lexicon 1 menggunakan kamus InSet. Penerapan kategorisasi aspek menghasilkan bahwa aspek dominan pada masing-masing e-commerce dan pada keseluruhan e-commerce adalah aspek produk. Penerapan metode berbasis leksikon pada analisis sentimen menghasilkan bahwa di seluruh e-commerce pada setiap aspeknya memiliki sentimen dominan positif. Implikasi dari penelitian ini adalah bertambahnya khazanah ilmu pengetahuan berkaitan kepuasan pelanggan dan bervariasinya kamus serta metode berbasis leksikon yang dapat menjadi referensi dan penelitian lebih lanjut. Selain itu, bagi penyelenggara e-commerce penelitian ini dapat membantu analisis untuk peningkatan maupun pengambilan kebijakan.

Technology affects various lines of people's lives. The use of transportation, meeting needs such as shopping, and even socializing all use technology. Activities such as buying and selling were also affected, thus encouraging the emergence of e-commerce. E-commerce is undoubtedly related to customers and e-commerce users who express various opinions. Some opinions are conveyed through social media owned by e-commerce, one of which is Twitter. This opinion is interesting to be explored to find out the sentiment and satisfaction level of e-commerce customers. Therefore, this study aims to compile aspects of customer satisfaction and determine the Lexicon method relevant to sentiment analysis. Data was taken from Twitter microblogs limited to 5 e-commerce organizers, namely Blibli, Bukalapak, Lazada, Shopee, and Tokopedia, with a total of 88,816 tweets. Categorized into aspects so that it becomes 12,995 tweets. Aspects of customer satisfaction are compiled through literature studies and produce six aspects. The six aspects are product, seller, logistics, price, service, and system. Categorizing tweets into aspects using keywords related to aspects as many as 73, resulting from analysis of word cloud results and topic modeling using LDA.
For the selection of the lexicon method, scenarios were created that were tested on 300 tweets labeled manually and randomly selected from aspect data. Two scenarios were performed on labeled data using Lexicon 1 and Lexicon 2. Lexicon 1 is a comparison of dictionaries, while Lexicon 2 is a comparison of different formulas. The result of Lexicon 1 is that all dictionaries have the same accuracy value of 0.54. Meanwhile, Lexicon 2 has the highest accuracy value of 0.46 on a scale of one dictionary with the first formula. So, the chosen method is Lexicon 1 using the InSet dictionary. The application of aspect categorization results in the dominant aspect in each e-commerce, and all e-commerce is the product aspect. Applying the lexicon-based method to sentiment analysis results in all e-commerce has dominant positive sentiment in every aspect. The implication of this research is to increase the knowledge related to customer satisfaction and the variety of dictionaries and lexicon-based methods that can be used as references and further research. In addition, for e-commerce organizers, this research can assist analysis for improvement and policy making."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2023

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Anton Ade Putra

Analisis Sentimen dan Pemodelan Topik terkait Metaverse di Media Sosial: Studi Kasus Bidang Riset dan Pengembangan Metaverse Universitas T = Sentiment Analysis and Topic Modeling Related to Metaverse in Social Media: A Case Study at the Metaverse Research and Development Department of University T

"Universitas T memiliki rencana (roadmap) untuk mengembangkan berbagai jenis Metaverse di masa depan. Namun, ada kekhawatiran bahwa roadmap yang telah dibuat mungkin tidak sesuai dengan kebutuhan masyarakat. Oleh karena itu, penelitian ini bertujuan untuk menganalisis sentimen dan pemodelan topik tentang Metaverse di media sosial guna memberikan wawasan yang penting bagi roadmap pengembangan Metaverse di Universitas T dengan memperhatikan pendapat dan sentimen masyarakat. Data yang digunakan dalam penelitian ini adalah twit berbahasa Indonesia yang dikumpulkan dari bulan Agustus 2021 hingga April 2023. Untuk analisis, digunakan pustaka LazyPredict yang menghasilkan lima model klasifikasi, yaitu Bernoulli Naive Bayes (BernoulliNB), Nearest Centroid, Calibrated Classifier CV, Logistic Regression, dan Linear Support Vector Classification (LinearSVC). Hasil menunjukkan bahwa model BernoulliNB memiliki performa terbaik dengan nilai rata-rata F1 sebesar 0,788. Selain itu, penelitian ini juga mengidentifikasi topik-topik yang dibahas terkait dengan Metaverse menggunakan pustaka Bertopic. Temuan menunjukkan adanya topik negatif seperti ketidakpastian pengembangan Metaverse, skeptisisme terhadap teknologi baru, keterbatasan infrastruktur internet, kekhawatiran etika dan syariah, ketidakpastian legalitas, kekhawatiran privasi dan keamanan, serta skeptisisme terhadap kesiapan Indonesia dalam membangun Metaverse. Di sisi lain, topik positif meliputi peluncuran Metaverse Jagat Nusantara, potensi kripto dalam konteks Metaverse, perubahan nama Facebook menjadi Meta, konser virtual di Metaverse, kehidupan di dunia Metaverse, pengembangan teknologi Metaverse di dalam negeri, transformasi digital dan inovasi di era Metaverse, penggunaan blockchain, kripto, dan NFT dalam teknologi Metaverse, serta Manasik Haji di Metaverse. Hasil analisis sentimen dan pemodelan ini dapat memberikan wawasan yang berharga bagi Universitas T dalam memahami tren dan pandangan masyarakat terkait Metaverse. Hal ini akan membantu universitas dalam mengevaluasi roadmap Metaverse yang telah dibuat untuk memastikan kesesuaiannya dengan kebutuhan masyarakat.

Universitas T has a roadmap to develop various types of Metaverse in the future. However, there are concerns that the existing roadmap may not align with the needs of society. Therefore, this research aims to analyze the sentiment and topic modeling related to Metaverse on social media to provide valuable insights for the development roadmap of Metaverse at Universitas T, taking into account the opinions and sentiments of the public. The data used in this study are Indonesian tweets collected from August 2021 to April 2023. The LazyPredict library is utilized for analysis, which generates five classification models: Bernoulli Naive Bayes (BernoulliNB), Nearest Centroid, Calibrated Classifier CV, Logistic Regression, and Linear Support Vector Classification (LinearSVC). The results show that the BernoulliNB model performs the best with an F1 score of 0.788. Additionally, this research identifies various topics discussed in relation to Metaverse using Bertopic library. Findings indicate the presence of negative topics such as uncertainty in Metaverse development, skepticism towards new technologies, limitations of internet infrastructure, ethical and Sharia concerns, legal uncertainties, privacy and security concerns, as well as skepticism about Indonesia's readiness in building the Metaverse. On the other hand, positive topics include the launch of Metaverse Jagat Nusantara, the potential of cryptocurrencies in the context of Metaverse, the name change of Facebook to Meta, virtual concerts in the Metaverse, life in the Metaverse world, domestic Metaverse technology development, digital transformation and innovation in the era of Metaverse, the use of blockchain, cryptocurrencies, and NFTs in Metaverse technology, as well as Manasik of Hajj in the Metaverse. The results of sentiment analysis and topic modeling can provide valuable insights for Universitas T to understand the trends and public perspectives regarding Metaverse. This will assist the university in evaluating the existing Metaverse roadmap to ensure its alignment with the needs of society."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2023

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Tubagus Ahmad Marzuqi

Deteksi Secara Otomatis Mahasiswa Drop-Out dan Terlambat Lulus: Studi Kasus Universitas Kristen Krida Wacana (Ukrida) = Automatic Detection of Drop-out and Late Graduate Students: A Case Study of Krida Wacana Christian University (Ukrida)

"Universitas Kristen Krida Wacana (UKRIDA) adalah salah satu perguruan tinggi swasta di Indonesia. UKRIDA secara periodik mengikuti proses akreditasi dan klaster universitas. Salah satu poin penilaian adalah kelulusan tepat waktu. Sayangnya, potensi terjadinya mahasiswa terlambat lulus atau drop out masih menjadi tantangan bagi organisasi. Untuk dapat melakukan tindakan mitigasi dan menyusun strategi retensi, perlu dilakukan prediksi terhadap mahasiswa yang berpeluang drop out (DO) dan terlambat lulus menggunakan data informasi dasar akademik. Hal tersebut dilakukan untuk membantu proses pengecekan mahasiswa DO yang sebelumnya masih manual. Selain itu, faktor informasi dasar akademik apa saja yang memengaruhi hasil prediksinya. Model yang dibangun menggunakan algoritma-algoritma yang diantaranya Logistic Regression, Naïve Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, dan Gradient Boosting. Kontribusi praktis pada penelitian ini adalah UKRIDA dapat menggunakan hasil penelitian ini untuk diimplementasikan pada sistem sehingga dapat memudahkan Koordinator Pangkalan Data UKRIDA dalam melakukan pengecekan secara otomatis. Kontribusi Teoritis pada penelitian ini, diharapkan dapat memberikan rekomendasi untuk akademis dalam membangun aspek teoritis terkait deteksi mahasiswa DO dan terlambat lulus. Hasilnya data yang digunakan untuk mendeteksi mahasiswa DO berhasil mencapai 99,42% pada metric precision dan 98,58% pada average precision. Data yang digunakan untuk mendeteksi mahasiswa terlambat lulus berhasil mencapai 78,51% pada metric precision dan AUC 82,86%. Faktor-faktor yang memengaruhi mahasiswa DO adalah status bayar karena terdapat mahasiswa yang hutang terprediksi DO, IPK dengan rata-rata dibawah 2 diprediksi DO, jumlah ulang mata kuliah di atas 1, tidak KRS di atas 2. Namun pada deteksi mahasiswa terlambat lulus, faktor-faktor yang memengaruhi hal tersebut adalah terdapat data yang lebih dari 1 Tidak KRS dan 24 kali mengulang mata kuliah serta dengan status bayar Hutang.

Krida Wacana Christian University (UKRIDA) is one of the private universities in Indonesia. UKRIDA periodically follows the accreditation process and university clusters. One of the points of assessment is graduation on time. Unfortunately, the potential for students to graduate late or drop out is still a challenge for organizations. To be able to take mitigation actions and develop retention strategies, it is necessary to predict students who are likely to drop out (DO) and graduate late using basic academic information data. This was done to help the process of checking DO students which was previously still manual. In addition, what are the basic academic information factors that affect the prediction results. The model is built using algorithms including Logistic Regression, Naïve Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, and Gradient Boosting. A practical contribution to this research is that UKRIDA can use the results of this research to be implemented in the system so that it can make it easier for the UKRIDA Database Coordinator to check automatically. Theoretical contributions to this research are expected to provide recommendations for academics in developing theoretical aspects related to the detection of dropped out students and late graduation. As a result, the data used to detect DO students managed to reach 99.42% on metric precision and 98.58% on average precision. The data used to detect late graduating students managed to reach 78.51% on metric precision and 82.86% AUC. The factors that affect dropout students are paid status because there are students whose debt is predicted to drop out, GPA with an average of below 2 is predicted to drop out, the number of repeat courses is above 1, not KRS is above 2. What affects this is that there are data that are more than 1 No KRS and repeat courses 24 times as well as with Debt payment status."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2022

S-pdf

UI - Tesis Membership Universitas Indonesia Library

Nurriasih Fatimah

Analisis Kesehatan Mental pada Media Sosial Berbahasa Indonesia = Analysis of Mental Health on Indonesian Social Media

"Di Indonesia, gangguan mental merupakan kontributor beban penyakit terendah, tetapi menjadi penyebab kecacatan utama jika dibandingkan dengan penyakit kardiovaskuler, neoplasma, maternal dan neonatal, juga infeksi pernafasan dan TB. Di media sosial, banyak pengguna melakukan diskusi dan membagikan konten edukatif mengenai kesehatan mental. Pengguna yang merupakan penderita gangguan mental juga banyak yang melakukan self reported diagnoses. Penelitian ini menggunakan data yang berasal dari Twitter yang akan digunakan untuk membangun model klasifikasi, analisis faktor apa yang menyebabkan sebuah tweet dapat diklasifikasikan sebagai tweet yang merefleksikan gangguan mental, dan menganalisis tweet yang merefleksikan gangguan mental. Model klasifikasi yang dibangun adalah model relevansi untuk menentukan relevansi dari suatu tweet dan model kategori untuk mengkategorikan tweet yang relevan ke dalam empat kategori, yaitu selfdiagnosed, terindikasi, penderita, dan penyintas. Model relevansi terbaik adalah model yang dibangun menggunakan Random Forest dan CountVectorizer unigram dengan hasil evaluasi yang didapatkan, yaitu akurasi 89,93%, precission 90,56%, recall 89,92%, dan f1-score 90%, sedangkan model kategori terbaik adalah model yang dibangun menggunakan Logistic Regression, TfidfVectorizer bigram, dan SMOTE dengan hasil evaluasi yang didapatkan adalah akurasi 83,62%, precission 83,22%, recall 83,61%, dan f1-score 81,98%. Faktor yang membuat sebuah tweet dapat diklasifikasikan sebagai tweet yang merefleksikan gangguan mental adalah fitur yang dimiliki oleh tweet karena setiap tweet memiliki karakteristik fiturnya masing-masing. Implikasi teoritis dari penelitian ini adalah penelitian ini dapat digunakan sebagai referensi untuk melakukan penelitian yang terkait analitika media sosial, terutama penelitian yang memiliki tema tentang kesehatan mental, sedangkan implikasi praktikal adalah hasil penelitian ini dapat dimanfaatkan sebagai data sekunder pada sistem informasi mengenai kesehatan mental yang dikembangkan oleh organisasi terkait dan dapat dimanfaatkan sebagai referensi tambahan dalam menangani masalah kesehatan mental di Indonesia.

In Indonesia, mental disorders are the lowest contributor to the burden of disease but are the main cause of disability when compared to cardiovascular, neoplasm, maternal and neonatal, also respiratory infections, and TB. On social media, many users have a lot of discussions and share educational content about mental health. Users with mental disorders also doing self-reported diagnoses. This study uses data from Twitter which will be used to build a classification model, analyze factors cause a tweet classified as a tweet that reflects mental disorders, and analyze tweets that reflect mental disorders. The classification models are relevance models to determine the relevance of a tweet and category models to categorize relevant tweets into four categories, there are self-diagnosed, indicated, sufferers, and survivors. The best relevance model is the model built using Random Forest and CountVectorizer unigram with the evaluation results are 89.93% accuracy, 90.56% precision, 89.92% recall, and 90% f1-score. While the best category model is the model built using Logistic Regression, TfidfVectorizer bigram, and SMOTE with the evaluation results are 83.62% accuracy, 83.22% precision, 83.61% recall, and 81.98% f1-score. The factor that makes a tweet can be classified as a tweet that reflects mental disorders is the feature of the tweet because each tweet has its characteristics feature. The theoretical implication is this research can be used as a reference for conducting research related to social media analytics, especially research with theme on mental health, while the practical implication is the results of this study can be used as a secondary data for developed mental health information system and can be used as an additional reference in dealing with mental health problems in Indonesia by related organizations."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2022

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Tubagus Ahmad Marzuqi

Deteksi secara otomatis mahasiswa drop-out dan terlambat lulus: studi kasus Universitas Kristen Krida Wacana (Ukrida) = Automatic detection of drop-out and late graduate students: a case study of Krida Wacana Christian University (Ukrida)

"Salah satu poin penilaian akreditasi universitas adalah jumlah mahasiswa yang lulus tepat waktu dan mahasiswa yang drop-out (DO). Sayangnya, potensi terjadinya mahasiswa terlambat lulus atau drop out masih menjadi tantangan bagi Universitas Kristern Krida Wacana (UKRIDA). Untuk dapat melakukan tindakan mitigasi dan menyusun strategi retensi, perlu dilakukan prediksi terhadap mahasiswa yang berpeluang DO dan terlambat lulus menggunakan data informasi akademik. Hal tersebut dilakukan untuk membantu proses pengecekan mahasiswa DO yang sebelumnya masih manual. Selain itu, faktor informasi akademik apa saja yang memengaruhi hasil prediksinya. Model yang dibangun menggunakan algoritma-algoritma yang diantaranya Logistic Regression, Nave Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, dan Gradient Boosting. Hasilnya data yang digunakan untuk mendeteksi mahasiswa DO berhasil mencapai 99,42% pada metric precision dan 98,58% pada average precision. Data yang digunakan untuk mendeteksi mahasiswa terlambat lulus berhasil mencapai 78,51% pada metric precision dan AUC 82,86%. Faktor-faktor yang memengaruhi mahasiswa DO adalah status bayar karena terdapat mahasiswa yang hutang terprediksi DO, IPK dengan rata-rata dibawah 2 diprediksi DO, jumlah ulang mata kuliah di atas 1, tidak KRS di atas 2. Namun pada deteksi mahasiswa terlambat lulus, faktor-faktor yang memengaruhi hal tersebut adalah terdapat data yang lebih dari 1 Tidak KRS dan 24 kali mengulang mata kuliah serta dengan status bayar Hutang.

One of the points of university accreditation assessment is the number of students who graduate on time and drop-out students (DO). Unfortunately, the potential for students to graduate late or drop out is still a challenge for Kristern Krida Wacana University (UKRIDA). To be able to take mitigation actions and develop retention strategies, it is necessary to predict students who are likely to drop out and graduate late using academic information data. This was done to help the process of checking DO students which was previously still manual. In addition, what academic information factors affect the prediction results. The model is built using algorithms including Logistic Regression, NaÃ¯ve Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, and Gradient Boosting. As a result, the data used to detect DO students managed to reach 99.42% on metric precision and 98.58% on average precision. The data used to detect late graduating students managed to reach 78.51% on metric precision and 82.86% AUC. The factors that affect dropout students are paid status because there are students whose debt is predicted to drop out, GPA with an average of below 2 is predicted to drop out, the number of repeat courses is above 1, not KRS is above 2. -Factors that influence this are data that is more than 1 No KRS and repeats courses 24 times as well as with Debt payment status."

Jakarta: Fakultas Ilmu Komputer Universitas Indonesia, 2022

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Putra Tresna Linge

Deteksi konten negatif (hoax) pada data microblog yang mengandung informasi COVID-19 = Negative content (hoax) detection on microblog data that contain COVID-19 information

"Selama beberapa tahun terakhir, jumlah penyebaran informasi semakin meningkat terutama semenjak adanya media sosial. Diantara informasi yang beredar terdapat informasi yang termasuk konten negatif atau hoax yang memiliki dampak yang buruk seperti timbulnya perpecahan akibat informasi yang tidak benar. Berdasarkan laporan kinerja kominfo tahun 2018, media sosial twitter merupakan penyumbang terbanyak penyebaran hoax. Untuk mengurangi dampak dari penyebaran hoax, diperlukan suatu metode untuk mendeteksi hoax pada twitter sehingga dapat dilakukan pencegahan seperti melakukan “take down” pada tweet yang termasuk hoax. Tujuan dilakukannya penelitian ini yaitu untuk mengembangkan sebuah model yang mampu mendeteksi konten negatif (hoax) secara otomatis dan juga melihat korelasi antara konten yang berupa hoax dengan orientasi sentimennya. Hasil dari penelitian ini yaitu berupa model yang berbasis pembelajaran mesin dengan menggunakan algoritma decision tree dengan akurasi 97,2% dengan nilai precision 85,4, recall 81,4 dan f1-score 93 dan model. Selain itu hasil analisis menunjukkan bahwa tweet yang merupakan hoax hasil identifikasi model didominasi oleh orientasi sentimen positif yaitu 52,64% dari total keseluruhan data yang diidentifikasi sebagai hoax. Implikasi praktikal dari penelitian ini berupa model deteksi hoax yang dapat digunakan sebagai alat bantu dalam proses penurunan penyebaran hoax. Sedangkan implikasi teoritis dari penelitian ini berupa data set, alur pembuatan model serta model yang dapat digunakan untuk penelitian berikutnya khususnya dalam bidan analitika media sosial dan digital.

Over the past few years, the amount of information dissemination has increased, especially since the advent of social media. Among the information circulating, there is information that includes negative content or hoax that have a bad impact such as the emergence of divisions due to incorrect information. Based on the 2018 Kominfo performance report, Twitter social media is the largest contributor to the spread of hoax. To reduce the impact of the spread of hoax, a method is needed to detect hoaxes on Twitter so that prevention can be done such as taking down tweets that are hoax. The purpose of this research is to develop a model that can detect negative content (hoax) automatically and also see the correlation between hoax content and sentiment orientation. The results of this study are a machine learning-based model using a decision tree algorithm with an accuracy of 97.2% with a precision value of 85.4, recall of 81.4, and f1-score 93 and the model. In addition, the results of the analysis show that tweets that are hoax as a result of model identification are dominated by positive sentiment orientation, which is 52.64% of the total data identified as hoax. The practical implication of this research is in the form of a hoax detection model that can be used as a tool in the process of reducing the spread of hoaxes. Meanwhile, the theoretical implications of this research are in the form of data sets, modeling flow and models that can be used for further research, especially in social and digital media analytics."

Jakarta: Fakultas Ilmu Komputer Universitas Indonesia, 2022

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Ghanim Kanugrahan

Analisis Sentimen Terhadap Kembalinya Pembelajaran dengan Sistem Tatap Muka Melalui Media Sosial Twitter = Sentiment Analysis of Face-to-Face Learning Systems through Social Media Twitter

"Pandemi Covid-19 telah melanda Indonesia selama lebih dari satu tahun. Hal tersebut menyebabkan terhentinya kegiatan normal di berbagai sektor kehidupan masyarakat, khususnya dalam dunia pendidikan. Setelah lebih dari satu tahun menutup kegiatan belajar tatap muka, Pemerintah kembali merencanakan kembalinya pendidikan tatap muka. Meskipun pendidikan tatap muka dinilai lebih efektif, akan tetapi bahaya Covid-19 yang semakin mudah menular menyebabkan kekhawatiran di dalam masyarakat. Untuk itu, pemerintah wajib menampung aspirasi rakyatnya. Salah satunya adalah dengan menggunakan metode sentimen analisis. Dengan mengkombinasikan feature yang terdapat pada data Twitter, maka kita bisa membangun sebuah model untuk mengklasifikasi opini masyarakat. Penelitian ini juga membandingkan algoritma machine learning Support Vector Machine (SVM) dan Multi-layer Perceptron (MLP). Hasilnya, penambahan feature dan penggunaan algoritma (SVM) dalam mengklasifikasikan model Sentiment-Neutral menghasilkan nilai akurasi dan F-1 Score terbaik (85,78% dan 81,0%). Selain itu, visualisasi menggunakan Scattertext berhasil merepresentasikan teks dalam suatu plot. Hasilnya adalah mayoritas masyarakat yang mendukung kembalinya pendidikan tatap muka berdasarkan kepercayaan bahwa pendidikan tatap muka lebih efektif dibandingkan dengan pendidikan online. Di sisi lain, masyarakat juga takut akan bahayanya virus Covid-19.

The Covid-19 pandemic has hit Indonesia for more than a year. This causes the cessation of daily activities in various sectors in Indonesia, especially in the education sector. After more than a year banning face-to-face learning activities, the Government is now planning to unban the face-to-face education. Although face-to-face education is considered more effective, the danger of Covid-19 which is easily transmitted causing concern amongst people. Because of that, the government must accommodate the aspirations of the people. One of them is by using sentiment analysis method. By combining the features contained in the Twitter data, we can build a model to classify public opinions. This study also compares machine learning algorithms Support Vector Machine (SVM) and Multi-layer Perceptron (MLP). As a result, the addition of features and the use of the SVM algorithm in classifying the Sentiment-Neutral model resulted in the best accuracy and F-1 scores (85.78% and 81.0%). In addition, visualization using Scattertext successfully represents text in a plot. The result is the majority of people who support the return of face-to-face education based on the belief that face-to-face education is more effective than online education. On the other hand, people are also afraid of the dangers of the Covid-19 virus."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2021

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

<< 1 2 3 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian