Hasil Pencarian

Winter, Patricia de

Starting out in statistics : an introduction for students of human health, disease and psychology

Chichester: Wiley BlackwellXXIV, 2014

610.21 WIN s

Buku Teks SO Universitas Indonesia Library

Tuwaji

A Study of bibliometric and topic distribution of special collection for academic society of STAN Al-Fatah Jayapura 2007-2017 (IAIN Fattahul Muluk Papua)

"This study of bibliometric and topic distribution of special collection of STAIN Al Fatah Jayapura library covered scientific books written by lecturers, research report, articles of jabal hikmah journal"

Jakarta: Pusat Jasa Perpustakaan dan Informasi, 2019

020 VIS 21:1 (2019)

Artikel Jurnal Universitas Indonesia Library

Maxwell, Robert L.

FRBR: a guide for the perplexed

"FRBR Functional Requirements for Bibliographic Records is an evolving conceptual model designed to help users easily navigate catalogs and find the material they want in the form they want it be that print, DVD, audio, or adaptations. Developed by the International Federation of Library Associations and Institutions Cataloging Section, FRBR is now being integrated into cataloging theory and implemented into systems and practice."

Chicago: American Management Association, 2008

e20437556

eBooks Universitas Indonesia Library

Chaluemwut Noyunsan

A social network newsworthiness filter based on topic analysis / Chaluemwut Noyunsan, Tatpong Katanyukul, Yuqing Wu, Kanda Runapongsa Saikaew

"Assessing trustworthiness of social media posts is increasingly important, as the number of online users and activities grows. Current deploying assessment systems measure post trustworthiness as credibility. However, they measure the credibility of all posts, indiscriminately. The credibility concept was intended for news types of posts. Labeling other types of posts with credibility scores may confuse the users. Previous notable works envisioned filtering out non-newsworthy posts before credibility assessment as a key factor towards a more efficient credibility system. Thus, we propose to implement a topic-based supervised learning approach that uses Term Frequency-Interim Document Frequency (TF-IDF) and cosine similarity for filtering out the posts that do not need credibility assessment. Our experimental results show that about 70% of the proposed filtering suggestions are agreed by the users. Such results support the notion of newsworthiness, introduced in the pioneering work of credibility assessment. The topic-based supervised learning approach is shown to provide a viable social network filter."

2016

J-Pdf

Artikel Jurnal Universitas Indonesia Library

Raden Trivan Sutrisman

Analisis metode inisialisasi pada algortima eigenspace based fuzzy c-means untuk pendeteksian topik berita online Indonesia = Analysis of initialization methods on eigenspace based fuzzy c-means algorithm for Indonesian online news topic detection

"ABSTRAK

Perkembangan berita online di Indonesia saat ini sudah semakin meningkat sehingga kebutuhan dalam melakukan analisis data berita sangat diperlukan untuk mendapatkan intisari informasi yang akurat dan cepat. Topik merupakan komponen dasar yang sering digunakan untuk menganalisis data dalam bentuk teks seperti berita. Dengan menggunakan pemodelan topik, dapat dilakukan pendeteksian topik secara otomatis pada koleksi dokumen berita yang sangat besar dan sulit dilakukan secara manual oleh manusia. Salah satu pemodelan topik yang dapat digunakan adalah metode clustering menggunakan Eigenspace Based Fuzzy C-Means (EFCM). Metode EFCM pada umumnya menggunakan inisialisasi random. Pada penelitian ini akan diimplementasikan metode inisialisasi menggunakan Non-Negative Double Singular Value Decomposition (NNDSVD) dan Fuzzy C-Means++ (FCM++) sebagai alternatif metode inisialisasi pada algoritma EFCM. Hasil simulasi menggunakan inisialisasi NNDSVD dan FCM++ menunjukkan nilai akurasi yang lebih baik dalam hal tingkat interpretabilitas topik daripada metode random.

ABSTRACT
The rapid increasing of online news in Indonesia creates the need for news analysis to obtain information as fast as possible. Topics are basic components that are often used to analyze data in the textual forms, such as the news article. By using topic modeling, topics can be detected automatically on large news documents which are difficult to perform manually. One of the topic modeling that can be used is the clustering-based method, i.e., Eigenspace-based Fuzzy C-Means (EFCM). The common initialization method of EFCM is random. In this research, Non-Negative Double Singular Value Decomposition (NNDSVD) and Fuzzy C-Means++ (FCM++) will be used as initialization methods of EFCM. The simulations show that the NNDSVD and FCM++ methods gives better accuracies in term of interpretability score than the random method."

Depok: Universitas Indonesia, 2018

T50041

UI - Tesis Membership Universitas Indonesia Library

Kellar, Stacey Plichta

Munro's statistical methods for health care research

"This text provides students with a solid foundation for understanding data analysis and specific statistical techniques. Focusing on the most current and frequently used statistical methods in todays health care literature, the book covers essential material for a variety of program levels including in-depth courses beyond the basic statistics course. Well-organized, clear text discussions and great learning tools help students overcome the complexities and fully comprehend the concepts of this often intimidating area of study."

Philadelphia: Wolters Kluwer Health; Lippincott Williams & Wilkins, 2013

610.727 KEL m

Buku Teks SO Universitas Indonesia Library

Chaluemwut Noyunsan

A social network newsworthiness filter based on topic analysis

"Assessing trustworthiness of social media posts is increasingly important, as the number of online users and activities grows. Current deploying assessment systems measure post trustworthiness as credibility. However, they measure the credibility of all posts, indiscriminately. The credibility concept was intended for news types of posts. Labeling other types of posts with credibility scores may confuse the users. Previous notable works envisioned filtering out non-newsworthy posts before credibility assessment as a key factor towards a more efficient credibility system. Thus, we propose to implement a topic-based supervised learning approach that uses Term Frequency-Interim Document Frequency (TF-IDF) and cosine similarity for filtering out the posts that do not need credibility assessment. Our experimental results show that about 70% of the proposed filtering suggestions are agreed by the users. Such results support the notion of newsworthiness, introduced in the pioneering work of credibility assessment. The topic-based supervised learning approach is shown to provide a viable social network filter."

Depok: Faculty of Engineering, Universitas Indonesia, 2016

UI-IJTECH 7:7 (2016)

Artikel Jurnal Universitas Indonesia Library

Julizar Isya Pandu Wangsa

Studi Perbandingan Metode Clustering K-Means, DBSCAN, dan HDBSCAN pada BERTopic untuk Pendeteksian Topik = Comparative Study of K-Means, DBSCAN, and HDBSCAN Clustering Methods on BERTopic for Topic Detection

"Pendeteksian topik merupakan suatu proses pengidentifikasian suatu tema sentral yang ada dalam kumpulan dokumen yang luas dan tidak terorganisir. Hal ini merupakan hal sederhana yang bisa dilakukan secara manual jika data yang ada hanya sedikit. Untuk data yang banyak dibutuhkan pengolahan yang tepat agar representasi topik dari setiap dokumen didapat dengan cepat dan akurat sehingga machine learning diperlukan. BERTopic adalah metode pemodelan topik yang memanfaatkan teknik clustering dengan menggunakan model pre-trained Bidirectional Encoder Representations from Transformers (BERT) untuk melakukan representasi teks dan Class based Term Frequency Invers Document Frequency (c-TF-IDF) untuk ekstraksi topik. Metode clustering yang digunakan pada penelitian ini adalah metode K-Means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), dan Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). BERT dipilih sebagai metode representasi teks pada penelitian ini karena BERT merepresentasikan suatu kalimat berdasarkan sequence-of-word dan telah memperhatikan aspek kontekstual kata tersebut dalam kalimat. Hasil representasi teks merupakan vektor numerik dengan dimensi yang besar sehingga perlu dilakukan reduksi dimensi menggunakan Uniform Manifold Approximation and Projection (UMAP) sebelum clustering dilakukan. Model BERTopic dengan tiga metode clustering ini akan dianalisis kinerjanya berdasarkan matrik nilai coherence, diversity, dan quality score. Nilai quality score merupakan perkalian dari nilai coherence dengan nilai diversity. Hasil simulasi yang didapat adalah model BERTopic menggunakan metode clustering K-Means lebih unggul 2 dari 3 dataset untuk nilai quality score dari kedua metode clustering yang ada.

Topic detection is the process of identifying a central theme in a large, unorganized collection of documents. This is a simple thing that can be done manually if there is only a small amount of data. For large amounts of data, proper processing is needed to represent the topic of each document quickly and accurately, so machine learning is required. BERTopic is a topic modeling method that utilizes clustering techniques by using pre-trained Bidirectional Encoder Representations from Transformers (BERT) models to perform text representation and Class based Term Frequency Inverse Document Frequency (c-TF-IDF) for topic extraction. The clustering methods used in this research are the K-Means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN). BERT was chosen as the text representation method in this research because BERT represents a sentence based on sequence-of-words and has considered the contextual aspects of the word in the sentence. The result of text representation is a numeric vector with large dimensions, so it is necessary to reduce the dimensions using Uniform Manifold Approximation and Projection (UMAP) before clustering is done. The BERTopic model with three clustering methods will be analyzed for performance based on the matrix of coherence, diversity, and quality score values. The quality score value is the multiplication of the coherence value with the diversity value. The simulation results obtained are the BERTopic model using K-Means clustering method is superior to 2 of the 3 datasets for the quality score value of the two existing clustering methods."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2023

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Dian Isnaeni Nurul Afra

Analisis Sentimen dan Pemodelan Topik dengan Data Media Sosial Twitter: Studi Kasus Komisi Pemberantasan Korupsi = Sentiment Analysis and Topic Modelling using Twitter Social Media Data: A Case Study of the Corruption Eradication Commission

"Komisi Pemberantasan Korupsi (KPK) memiliki kewenangan dalam melakukan pendaftaran dan pemeriksaan terhadap Laporan Harta Kekayaan Penyelenggara Negara (LHKPN). Pelaporan ini berfungsi untuk melakukan pengawasan kejujuran, integritas, dan deteksi kemungkinan adanya tindakan memperkaya diri secara melawan hukum oleh pejabat publik. Publikasi LHKPN sering menimbulkan prasangka negatif dan kecurigaan publik terhadap laporan harta kekayaan pejabat yang mengakibatkan kekhawatiran pejabat untuk melaporkan harta kekayaan secara lengkap dan benar. Persepsi ini menjadi kontraproduktif dengan upaya pencegahan korupsi yang dilakukan oleh KPK apabila tidak direspon dengan cepat. Penelitian ini bertujuan untuk membuat model analisis sentimen dan pemodelan topik yang dapat mengeksplorasi topik dari data media sosial Twitter. Indonesia memiliki jumlah pengguna aktif terbesar keenam di dunia dengan 15,7 juta pengguna yang didominasi kelompok usia 25-34 tahun. Dataset sejumlah 881 data diambil dari Twitter dengan kata kunci "lhkpn" dan "harta kekayaan pejabat" pada periode 1 Agustus sampai 5 November 2021. Penelitian ini mengekplorasi beberapa algoritma klasifikasi, representasi fitur unigram, bigram, dan trigram dengan CountVectorizer dan TFIDF, serta metode oversampling SMOTE. Algoritma klasifikasi dengan performa paling baik pada penelitian ini adalah Multilayer Perceptron dengan fitur unigram CountVectorizer dan metode oversampling dengan accuracy 76,60%, precision 78,19%, recall 76,60%, dan F1 score 76,95%. Hasil pemodelan topik menggunakan Latent Dirichlet Allocation pada kategori ‘negatif’ didominasi ekspresi kekecewaan dan kemarahan masyarakat terhadap meningkatnya harta kekayaan pejabat selama masa pandemi Covid-19 yang berbanding terbalik dengan meningkatnya utang negara dan kesulitan yang dihadapi masyarakat selama pandemi. Topik yang dihasilkan pada kategori ‘positif’ cukup beragam mulai dari aturan untuk melakukan pembuktian terbalik, usulan mengenai kewajiban pelaporan dan sanksi, permintaan untuk membuka laporan kekayaan kepada publik, serta pembahasan mengenai kewajaran penambahan harta kekayaan yang disebabkan oleh meningkatnya nilai aset tidak bergerak.

The Corruption Eradication Commission (KPK) has the authority to register and examine Public Officials Wealth Reports (LHKPN). This report serves to monitor honesty, integrity, and detect the possibility of illegal enrichment by public officials. Publication of LHKPN often creates negative prejudice and public suspicion of official wealth reports, which causes officials to worry about reporting assets completely and correctly. This perception is counterproductive to the efforts to prevent corruption carried out by the KPK if it is not responded to quickly. This study aims to create a sentiment analysis model and topic modelling that can explore topics from Twitter social media data. Indonesia has the sixth-largest number of active users in the world with 15.7 million users, dominated by the 25-34 year age group. A dataset of 881 data was taken from Twitter with the keywords "lhkpn" and "official assets" in the period August 1 to November 5, 2021. This study explores several classification algorithms, representation of unigram, bigram, and trigram features with CountVectorizer and TFIDF, as well as SMOTE oversampling methods. The classification algorithm with the best performance is the Multilayer Perceptron with the unigram CountVectorizer feature and the oversampling method with 76.60% accuracy, 78.19% precision, 76.60% recall, and 76.95% F1 score. The results of topic modelling using Latent Dirichlet Allocation in the 'negative' category are dominated by expressions of public disappointment and anger towards the increase in official wealth during the Covid-19 pandemic which is inversely proportional to the increase in state debt and the difficulties faced by the community during the pandemic. The topics generated in the 'positive' category are quite diverse, starting from the rules for conducting reverse verification, proposals on reporting obligations and sanctions, requests to disclose wealth reports to the public, as well as discussions on the reasonableness of adding to assets caused by the increase in the value of immovable assets."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2022

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Hanif Sudira

Pembuatan model analisis sentimen untuk perhitungan brand reputation serta pemanfaatan topic modelling pada layanan Indihome menggunakan data Twitter dan Instagram = Creating a sentiment analysis model for calculation of brand reputation and utilization of topic modelling on Indihome services using Twitter and Instagram data

"Peran internet semakin penting dalam berbagai aspek kehidupan masyarakat. Kebutuhan akan internet menjadi peluang bagi penyedia internet, salah satunya Telkom dengan IndiHome. Sebagai BUMN, Telkom berperan sebagai penyedia layanan internet untuk memenuhi kebutuhan masyarakat. Berdasarkan survei kepuasan pelanggan tahun 2019 dan 2020, NPS IndiHome tidak mencapai target. Dari target besar atau sama dengan 5, tahun 2019 dan 2020, NPS IndiHome sebesar -1,67 dan 2,87. Hal ini karena pengerjaan permasalahan masih berdasarkan laporan, belum memiliki cara untuk mengetahui permasalahan yang terjadi dan belum memanfaatkan opini media sosial karena masih memanfaatkan survei. Penelitian ini membangun model analisis sentimen dam topic modelling IndiHome pada twitter & instagram. Data diambil dari bulan Maret 2019-April 2021. Model yang dihasilkan menggunakan metode SVM, twitter akurasi 70,13% dan instagram akurasi 73,55%. Sentimen mayoritas negatif, nilai NPS -79,49 pada twitter dan -56,12 pada Instagram. Dari twitter & instagram respons terhadap IndiHome memiliki indeks negatif, dimana masyarakat tidak puas dengan IndiHome. Hasil Topik diskusi negatif yaitu internet IndiHome mati mendadak, internet IndiHome lamban, internet IndiHome mati ketika terjadi hujan, biaya IndiHome mahal, pelayanan IndiHome tidak responsif, pelayanan IndiHome tidak solutif, sudah bayar internet diisolir, janji temu teknisi tidak sesuai waktu, dan ingin berhenti berlangganan atau pindah provider.

The role of the internet is increasingly important in various aspects of people's lives. The need for internet is an opportunity for internet providers, one of which is Telkom and IndiHome. As a BUMN, Telkom acts as a provider of internet services to meet the needs of the community. Based on customer satisfaction surveys in 2019 and 2020, IndiHome's NPS did not reach the target. Of the large target or equal to 5, in 2019 and 2020, IndiHome's NPS is -1.67 and 2.87. This is because the problem solving is still based on reports, does not have a way to find out the problems that occur and has not used social media opinions because they are still using surveys. This study builds a sentiment analysis model and IndiHome topic modeling on Twitter & Instagram. The data was taken from March 2019-April 2021. The resulting model used the SVM method, twitter 70.13% accuracy and instagram 73.55% accuracy. The majority sentiment is negative, the NPS score is -79.49 on Twitter and -56.12 on Instagram. From Twitter & Instagram, the response to IndiHome has a negative index, where people are not satisfied with IndiHome. The results of the negative discussion topics are IndiHome internet shuts down suddenly, IndiHome internet is slow, IndiHome internet shuts down when it rains, IndiHome costs are expensive, IndiHome services are unresponsive, IndiHome services are not solutive, already paid for the internet is isolated, technician appointments are not on time, and want to stop subscribe or switch providers."

Jakarta: Fakultas Ilmu Komputer Universitas Indonesia, 2022

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Nia Dwi Rahayuningtyas

Analisis Teks pada Tweet Berbahasa Indonesia untuk Mendeteksi Pro Kontra Vaksinasi Menggunakan Pendekatan Stance Detection dan Topic Modeling = Text Analytics on Indonesian Tweets to Detect Pro vs Anti Vaccination Using Stance Detection and Topic Modeling

"Keraguan dan penolakan orang tua terhadap vaksinasi meningkat secara global. Maraknya penyebaran isu vaksinasi melalui media sosial mengarahkan persepsi publik pada keraguan terhadap vaksin yang berujung pada penurunan cakupan imunisasi dan tidak tercapainya target IDL di Indonesia. Pada media sosial Twitter terdapat dua kelompok, yaitu kelompok pro-vaksin yang mendukung vaksinasi dan anti-vaksin yang menolak vaksinasi.

Penelitian ini bertujuan untuk mengidentifikasi apakah sebuah Tweet memiliki kecenderungan ke arah pro- atau anti-vaksin dan untuk mengeksplorasi topik-topik terkait pro-vaksin dan anti-vaksin. Dataset diambil dari Twitter dengan kata kunci "vaksin" dan "imunisasi" lebih dari 9.000 data Tweet antara 11 Agustus sampai 10 September 2019. Anotasi dilakukan dalam 3 langkah berturut-turut dengan tiga pasangan label yaitu RELEVANT/IRRELEVANT, SUBJECTIVE/NEUTRAL, dan PRO/ANTI. Tiga eksperimen yaitu pemilihan fitur, algoritma, dan pipeline klasifikasi dilakukan untuk mendapatkan model stance detection terbaik yaitu nilai rata-rata micro tertinggi dari precision, recall, dan f1-score.

Fitur terpilih adalah kombinasi 3 fitur teks Count +Unigram+Bigram dengan algoritma Logistic Regression dan pipeline Two-stage Classification (f1-score = 80,5%). Algoritma terpilih pada pembentukan topic modeling adalah NMF dan LDA masing-masing untuk korpus pro-vaksin dan anti-vaksin dengan nilai koherensi sebesar 0.999.

Topik-topik anti-vaksin meliputi kritik terhadap fatwa halal MUI untuk Vaksin MR, kandungan babi pada Vaksin Meningitis Haji, komersialisasi vaksin, vaksin palsu, KIPI dan bahaya vaksin, vaksin sebagai alat konspirasi dan agenda Yahudi, tuntutan vaksin halal, dan seterusnya. Sedangkan topik-topik pro-vaksin lebih bersifat homogen yaitu mengenai manfaat dan pentingnya imunisasi, aturan pemberian vaksin, dan kampanye dalam bentuk publisitas kegiatan imunisasi, dan anjuran vaksin.

Parents hesitancy and refusal toward immunization was rising globally. The rise of the issue of vaccination through social media directs the public's perception of vaccine hesitancy that lead to a reduction in immunization coverage and the unfulfilled IDL target in Indonesia. There are two groups: pro-vaccine that support vaccines and anti-vaccine that refuse vaccines for various reasons that expressed in tweets on Twitter.
This research aims to identify whether a tweet has a tendency to support, or oppose immunization or vaccines and exploring the topic of pro-vaccine and anti-vaccine corpus. The dataset was taken from Twitter with the keywords "vaksin" and "imunisasi" of more than 9,000 tweets at 11 August until 10 September 2019. Annotation was carried out in 3 consecutive steps with three couple label namely RELEVANT vs IRRELEVANT, SUBJECTIVE vs NEUTRAL, and PRO vs ANTI.
Three experiments, namely the selection of features, algorithms, and pipeline were carried out to get the best model of stance detection which has the highest micro average precision, recall, and f1-scores. The selected feature is combination of Count +Unigram+Bigram features with Logistic Regression and pipeline Two-stage Classification (f1-score = 80,5%).
The selected topic modeling algorithms are NMF and LDA for the corpus pro-vaccine and anti-vaccine with coherence score 0.999. Anti-vaccine topics include criticism of the halal MUI fatwa for MR vaccine, pork gelatine in the Hajj Meningitis Vaccine, vaccines for business fields, fake vaccines, KIPI and vaccine hazards, vaccines as part of conspiracy and Jewish agenda, demands for halal vaccines, etc. Whereas pro-vaccine topics are more homogeneous, namely the benefits and importance of immunization, vaccine administration rules, and campaigns with publicity of immunization activities and vaccine recommendations."

Depok: Fakultas Ilmu Komputer Universitas Indonesia , 2020

TA-Pdf

UI - Tugas Akhir Universitas Indonesia Library

Aditya Tejabaswara

Analisa media sosial twitter dengan perhitungan graph edit distance untuk mendeteksi rumor pada trending topic SIAK-NG = Implemetation of graph edit distance for rumor detection on twitter trending topic: case study UI academic information system

"Pesatnya perkembangan teknologi disertai dengan tingkat penggunaannya membawa dampak positif di berbagai bidang kehidupan manusia, namun juga dapat membawa dampak negatif jika tidak didukung dengan tanggung jawab pengguna teknologi itu sendiri. Bidang telekomunikasi adalah salah satu bidang yang perkembangannya sangat dirasakan oleh manusia. Salah satu dari perkembangan telekomunikasi adalah lahirnya media sosial. Manusia menggunakan media sosial untuk berbagi informasi apapun kepada siapapun. Namun yang menjadi masalah kemudian adalah apakah informasi yang tersebar merupakan informasi yang nilai kebenarannya telah teruji atau hanya sebuah rumor. Rumor dapat saja mengakibatkan tersebarnya informasi yang salah di suatu golongan atau komunitas manusia.

Adapun topik yang terkait pada tugas akhir ini adalah siak-ng yang menjadi trending topic di media sosial twitter. l. Mengidentifikasi rumor pada media sosial online sangat krusial nilainya karena mudahnya informasi yang disebar oleh sumber yang tidak jelas.

Pada tugas akhir ini akan ditunjukkan salah satu cara pengidentifikasian rumor dengan menggunakan kalkulasi graph edit distance. Graph edit distance merupakan salah satu langkah yang paling cocok untuk menentukan persamaan antar grafik dan pengenalan pola jaringan kompleks. Untuk mencapai tujuan akhir, langkahlangkah yang dilakukan adalah pengambilan data, konversi data, pengolahan data, dan visualisasi. Dengan pengolahan data didapat Sembilan padanan kata antara Parent Node dan Child Node serta 3 kategori edge label. Pada akhirnya ditemukan bahwa rumor sistem siak-ng sedang mengalami load tinggi merupakan rumor yang nilai kebenarannya tinggi.

Rapid development of technology coupled with the utilizing bring positive impact in many areas of human life, but also have negative impacts if not supported with the responsibility of the users. Telecommunications is one area in which development is perceived by humans. One of the development of telecommunications is social media established.Humans use social media to share any information with anyone. However, the issue then is whether the spread of information is information whose truth value has been tested or just a rumor. Rumors will lead to the spread of false information in a group or people's community.
The topics related to this thesis is the SIAK-NG become trending topic on social media Twitter. Identifying online rumors on social media is crucial value because of the information ease spread by unverified sources.
At the end of this assignment will be demonstrated one way of identifying the rumor by using graph edit distance calculations. Graph edit distance is one of the most appropriate steps to determine the similarities between graphs and pattern recognition of complex networks. To achieve the ultimate goal, the steps taken are data retrieval, data conversion, data processing, and visualization. By data processing obtain nine words comparison between Parent node and Child Node with three edge label category. Finally, the tweet that said the system has high range of load was the true rumor."

Depok: Fakultas Teknik Universitas Indonesia, 2012

S42944

UI - Skripsi Open Universitas Indonesia Library

Dwie Putri Donnaro

Penggunaan Algoritma Clustering K-Means, DBScan, LDA, dan Kombinasi K-Means dengan DBScan untuk Menentukan Trending Topic pada Media Sosial X = Use of K-Means Clustering, DBScan, LDA, and Combination of K-Means with DBScan to Determine Trending Topic on Social Media X

"Masyarakat Indonesia sangat sering menggunakan media sosial twitter dan sekarang lebih dikenal dengan X untuk berbagi foto, video atau membuat tweet tentang topic yang sedang trend. Namun tidak banyak dari masyarakat Indonesia yang memanfaatkan trending topic ini untuk membuat konten dalam memasarkan produk barunya. Pada penelitian ini telah dilakukan pengelompokkan trending topic dengan menggunakan 3 algoritma clustering yaitu K-Means, DBScan dan LDA dengan menggunakan 2 kondisi yaitu Menggunakan Kata Kunci dan Tanpa Menggnakan kata Kunci, untuk kategori cluster telah ditentukan yaitu Cluster Politik, Cluster Ekonomi dan Cluster Pendidikan. Hasil penelitian ini adalah K-Means dengan menggunakan kata kunci lebih baik dari pada semuanya yaitu dengan nilai validitas 0,5810 sedangkan diposisi kedua yang termasuk baik adalah DBScan menggunakan kata kunci dengan nilai validitas 0,4656. Oleh karena itu karena hasilnya masih dalam tingkatan 2 yaitu struktur cluster masih dalam kategori baik, maka peneliti melakukan kombinasi antara K-Means dan DBScan dengan menggunakan kata kunci. Dan hasilnya struktur yang terbentuk masuk dalam tingkatan 1 yaitu dalam kategori kuat, nilai validitas yang dihasilkan yaitu 0,7864, sehingga antar trending topic dalam masing-masing cluster memiliki keterkaitan.

Indonesians very often use social media twitter and now better known as X to share photos, videos or make tweets about trending topics. However, not many Indonesians utilize this trending topic to create content to market their new products. In this study, clustering of trending topics has been carried out using 3 clustering algorithms namely K-Means, DBScan and LDA using 2 conditions namely Using Keywords and Without Using Keywords, for cluster categories have been determined namely Political Cluster, Economic Cluster and Education Cluster. The results of this study are K-Means using keywords is better than all of them with a validity value of 0.5810 while in second place which is good is DBScan using keywords with a validity value of 0.4656. Therefore, because the results are still in level 2, namely the cluster structure is still in the good category, the researchers conducted a combination of K-Means and DBScan using keywords. And the result is that the structure formed is in level 1, which is in the strong category, the resulting validity value is 0.7864, so that between trending topics in each cluster have a relationship."

Depok: Fakultas Teknik Universitas Indonesia, 2024

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Collen, Morris F.

Computer medical databases: the first six decades (1950–2010)

"Chapter 1 offers an overview of the basic computer technology. Each succeeding chapter, describes the problems in medicine, followed by a review in chronological sequence of why and how computers were applied to try to meet these problems. Only the technical aspects of computer hardware, software, and communications are discussed as they are necessary to explain how the technology was applied. This approach generally led to defining the objectives for applications of medical informatics. At the end of each chapter, the author summarizes his personal views and interpretations of the chapter contents. Although the concurrent evolution of medical informatics in Canada, Europe, and Japan certainly influenced workers in the United States, the scope of this historical review is limited to the development of medical informatics within the United States. Furthermore, this review is limited to electronic digital computers, it excludes mechanical, analog, and hybrid computers."

London: Springer, 2012

e20410766

eBooks Universitas Indonesia Library

Designs for clinical trials; perspectives on current issues

"This book will examine current issues and controversies in the design of clinical trials, including topics in adaptive and sequential designs, the design of correlative genomic studies, the design of studies in which missing data is anticipated. Each chapter will be written by an expert conducting research in the topic of that chapter. As a collection, the chapters would be intended to serve as a guidance for statisticians designing trials."

New York: Springer, 2012

e20417633

eBooks Universitas Indonesia Library

Ward, Francine

Staying legal: a guide to copyright and trademark use

"Is your Web site in violation of copyright laws? Do you know how you can keep others from copying your creative products? What effect has the rapid proliferation of communications--from the Internet to ipods--had on copyright law?"

Alexandria, Virginia: American Society for Training & Development, 2007

e20441972

eBooks Universitas Indonesia Library

Limisgy Ramadhina Febirautami

Penentuan karakteristik lagu populer di industri musik Indonesia menggunakan music mining = Determining characteristics of popular songs in Indonesia's music industry using music mining

"Industri musik di Indonesia merupakan salah satu prioritas utama oleh Badan Ekonomi Kreatif BEKRAF untuk ditingkatkan daya saingnya. Dalam meningkatkan daya saingnya, industri musik dihadapi dengan tantangan utama yaitu sumber daya manusia yang kurang memadai serta pemanfaatan pasar yang belum optimal. Oleh karena itu, industri musik di Indonesia perlu mengembangkan strategi terbaru untuk menghadapi tantangan tersebut. Seiring berkembangnya teknologi di industri musik, data terkait musik seperti tangga lagu, fitur audio, serta lirik lagu semakin mudah didapatkan. Data tersebut berpotensi memberikan informasi penting dan karakteristik terkait lagu yang sedang tren dan diminati oleh masyarakat Indonesia.

Penelitian ini bertujuan untuk mengetahui karakteristik lagu yang populer atau diminati oleh masyarakat Indonesia menggunakan music mining. Penentuan karakteristik dilakukan dengan mengklasifikasikan lagu keseluruhan, lagu lokal, dan lagu internasional yang populer di Indonesia. Berdasarkan hasil penelitian ini, algoritme pohon keputusan C5.0 mampu memberikan karakteristik yang detail dan optimal untuk lagu keseluruhan, lagu lokal, dan lagu internasional dengan akurasi yang baik. Terdapat beberapa kesamaan dan perbedaan karakteristik yang dimiliki antara lagu secara keseluruhan, lagu lokal, dan lagu internasional yang berkaitan satu sama lain. Keterkaitan tersebut dapat digunakan menjadi strategi dan rekomendasi pelaku industri musik dalam memproduksi lagu yang sesuai dengan preferensi masyarakat.

Music industry is one of main priority to improve its competitiveness by BEKRAF. Music industry is faced by main challenges such as insufficient human resources and unoptimized market utilization. Thus, music industry in Indonesia needs to develop new strategy to handle the challenges. Nowadays, the developing of technology in music industry data related to music, e.g. top charts, audio features, and song lyric are easily obtained. Those data are potential to give important information and characteristics related to trending songs and songs preferred in Indonesia.
This study aims to know the characteristics of popular or preferred songs in Indonesia using music mining. Determining the characteristic was done by classifying overall songs, local songs, and international songs that are popular. Based on this study, C5.0 decision tree is able to give detail and optimal characteristics of overall songs, local songs, and international songs. There are some similarities and differences among them that are related to each other. Those relations among characteristics can be used to be the strategy and recommendation for music industry in producing songs that are preferred in Indonesia. "

Depok: Fakultas Teknik Universitas Indonesia, 2018

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Naufal Farhan

Metode Improved Deep Embedded Clustering untuk Pendeteksian Topik = Improved Deep Embedded Clustering Method for Topic Detection

"

Pendeteksian topik adalah suatu proses yang digunakan untuk menganalisis kata-kata pada suatu koleksi data tekstual untuk menentukan topik-topik yang ada pada koleksi tersebut. Salah satu metode standar yang digunakan untuk pendeteksian topik adalah metode clustering. Deep embedded clustering (DEC) adalah algoritma clustering dengan pendekatan deep learning yang menyatukan pembelajaran fitur dan clustering menjadi satu kerangka kerja sehingga dapat menghasilkan kinerja yang lebih baik. Namun metode DEC memiliki kelemahan, yaitu terjadinya penyimpangan ruang embedded ketika melakukan pembelajaran yang didapat ketika membuang decoder. Kelemahan tersebut diatasi dengan tidak membuang decoder, sehingga diperoleh metode yang lebih baik lagi yaitu Improved Deep Embedded Clustering (IDEC). Proses mempertahankan decoder disebut sebagai pelestarian struktur lokal. Pada penelitian ini, metode IDEC diadaptasi untuk masalah pendeteksian topik data tekstual berbahasa Indonesia. Selanjutnya kinerja metode IDEC dibandingkan dengan metode penelitian lain yang menggunakan DEC untuk masalah pendeteksian topik yaitu dengan cara membandingkan nilai dari coherence. Nilai coherence yang dihasilkan menunjukkan bahwa metode DEC lebih cocok jika dibandingkan dengan metode IDEC untuk permasalahan pendeteksian topik. Hal tersebut terjadi karena bagian decoder pada metode IDEC diperbarui sehingga parameter decoder sudah tidak sesuai untuk mengembalikan data ke dimensi semula. Sedangkan pada metode DEC bagian decoder dibuang sehingga parameter tidak diperbarui.

Topic detection is a process that is used to analyze words in a textual data collection to determine the topics within that collection. One of this standard topic detection method is clustering method. Deep embedded clustering (DEC) is a clustering algorithm with a deep learning approach that combines feature learning and clustering into one framework to obtain a better performance. However, DEC method has a weakness namely the distortion of embedded space that is caused by removing the decoder during the learning process. This weakness can be overcome by preserving the decoder, hence a better method is acquired, namely Improved Deep Embedded Clustering (IDEC). The process of preserving the decoder is called local structure preservation. In this research we adapt IDEC method for topic detection problem in Indonesian textual dataset. Furthermore, we compare the performance of IDEC method and other research using DEC by comparing the coherence value. The acquired coherence value shows that DEC method is more suitable compared to IDEC method for topic detection problems. This happens because of the decoder part in IDEC method is updated, so that the decoder parameters are no longer suitable to return the data into the original dimension. While in the DEC method the decoder was removed, therefore the parameters are not updated.

"

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2020

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Anton Ade Putra

Analisis Sentimen dan Pemodelan Topik terkait Metaverse di Media Sosial: Studi Kasus Bidang Riset dan Pengembangan Metaverse Universitas T = Sentiment Analysis and Topic Modeling Related to Metaverse in Social Media: A Case Study at the Metaverse Research and Development Department of University T

"Universitas T memiliki rencana (roadmap) untuk mengembangkan berbagai jenis Metaverse di masa depan. Namun, ada kekhawatiran bahwa roadmap yang telah dibuat mungkin tidak sesuai dengan kebutuhan masyarakat. Oleh karena itu, penelitian ini bertujuan untuk menganalisis sentimen dan pemodelan topik tentang Metaverse di media sosial guna memberikan wawasan yang penting bagi roadmap pengembangan Metaverse di Universitas T dengan memperhatikan pendapat dan sentimen masyarakat. Data yang digunakan dalam penelitian ini adalah twit berbahasa Indonesia yang dikumpulkan dari bulan Agustus 2021 hingga April 2023. Untuk analisis, digunakan pustaka LazyPredict yang menghasilkan lima model klasifikasi, yaitu Bernoulli Naive Bayes (BernoulliNB), Nearest Centroid, Calibrated Classifier CV, Logistic Regression, dan Linear Support Vector Classification (LinearSVC). Hasil menunjukkan bahwa model BernoulliNB memiliki performa terbaik dengan nilai rata-rata F1 sebesar 0,788. Selain itu, penelitian ini juga mengidentifikasi topik-topik yang dibahas terkait dengan Metaverse menggunakan pustaka Bertopic. Temuan menunjukkan adanya topik negatif seperti ketidakpastian pengembangan Metaverse, skeptisisme terhadap teknologi baru, keterbatasan infrastruktur internet, kekhawatiran etika dan syariah, ketidakpastian legalitas, kekhawatiran privasi dan keamanan, serta skeptisisme terhadap kesiapan Indonesia dalam membangun Metaverse. Di sisi lain, topik positif meliputi peluncuran Metaverse Jagat Nusantara, potensi kripto dalam konteks Metaverse, perubahan nama Facebook menjadi Meta, konser virtual di Metaverse, kehidupan di dunia Metaverse, pengembangan teknologi Metaverse di dalam negeri, transformasi digital dan inovasi di era Metaverse, penggunaan blockchain, kripto, dan NFT dalam teknologi Metaverse, serta Manasik Haji di Metaverse. Hasil analisis sentimen dan pemodelan ini dapat memberikan wawasan yang berharga bagi Universitas T dalam memahami tren dan pandangan masyarakat terkait Metaverse. Hal ini akan membantu universitas dalam mengevaluasi roadmap Metaverse yang telah dibuat untuk memastikan kesesuaiannya dengan kebutuhan masyarakat.

Universitas T has a roadmap to develop various types of Metaverse in the future. However, there are concerns that the existing roadmap may not align with the needs of society. Therefore, this research aims to analyze the sentiment and topic modeling related to Metaverse on social media to provide valuable insights for the development roadmap of Metaverse at Universitas T, taking into account the opinions and sentiments of the public. The data used in this study are Indonesian tweets collected from August 2021 to April 2023. The LazyPredict library is utilized for analysis, which generates five classification models: Bernoulli Naive Bayes (BernoulliNB), Nearest Centroid, Calibrated Classifier CV, Logistic Regression, and Linear Support Vector Classification (LinearSVC). The results show that the BernoulliNB model performs the best with an F1 score of 0.788. Additionally, this research identifies various topics discussed in relation to Metaverse using Bertopic library. Findings indicate the presence of negative topics such as uncertainty in Metaverse development, skepticism towards new technologies, limitations of internet infrastructure, ethical and Sharia concerns, legal uncertainties, privacy and security concerns, as well as skepticism about Indonesia's readiness in building the Metaverse. On the other hand, positive topics include the launch of Metaverse Jagat Nusantara, the potential of cryptocurrencies in the context of Metaverse, the name change of Facebook to Meta, virtual concerts in the Metaverse, life in the Metaverse world, domestic Metaverse technology development, digital transformation and innovation in the era of Metaverse, the use of blockchain, cryptocurrencies, and NFTs in Metaverse technology, as well as Manasik of Hajj in the Metaverse. The results of sentiment analysis and topic modeling can provide valuable insights for Universitas T to understand the trends and public perspectives regarding Metaverse. This will assist the university in evaluating the existing Metaverse roadmap to ensure its alignment with the needs of society."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2023

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Novialdi Ashari

Analisis Sentimen dan Pemodelan Topik Ulasan Pengguna di Google Play Store: Studi Kasus Aplikasi Learn Quran Tajwid = Sentiment Analysis and Topic Modelling of User Reviews on Google Play Store: A Case Study of Learn Quran Tajwid App

"Perkembangan pesat teknologi menyebabkan pertumbuhan pengguna perangkat mobile

semakin meningkat. Hal tersebut mendorong para pengembang aplikasi untuk

mengembangkan berbagai aplikasi. Aplikasi Learn Quran Tajwid merupakan aplikasi

yang diperuntukkan bagi pengguna untuk belajar dan memahami bacaan al-quran lebih

detail dengan audio yang tepat dalam melafadzkan al-quran dan pengguna dapat

mempraktekkan bacaan dengan koreksi dari aplikasi. Pendapatan Learn Quran Tajwid

bersumber pada layanan berlangganan dan iklan. Sumber utamanya pada pendapatan

layanan paket berlangganan khususnya di Google Play Store namun sumber pendapatan

utama tersebut terus mengalami penurunan pertumbuhan bulanan dari tahun sebelumnya.

Target peningkatan pertumbuhan pendapatan bulanan Aplikasi Learn Quran Tajwid di

Google Play Store dari tahun sebelumnya (y-o-y) tidak tercapai. Oleh sebab itu, dilakukan

analisis akar masalah dan didapatkan masalah utamanya adalah kepuasaan pelanggan

menurun. Tujuan penelitian ini adalah melihat bagaimana pandangan pengguna Aplikasi

Learn Quran Tajwid di Google Play Store dengan melakukan analisis sentimen dan

pemodelan topik. Data ulasan yang digunakan berjumlah 5100 ulasan yang didapatkan

dengan melakukan scraping dari ulasan pengguna aplikasi Learn Quran Tajwid di Google

Play Store dengan rincian 3026 ulasan sebagai data latih. Selanjutnya data latih

dianotasikan manual untuk menentukan sentimen positif atau negatif kemudian dilakukan

preprocessing dan representasi teks menggunakan TF-IDF. Penelitian ini menggunakan

algoritma NB, SVM, XGBoost, CNN, LSTM dan BERT untuk klasifikasi sentimen. Hasil

eksperimen menunjukkan bahwa algoritma klasifikasi dengan kinerja terbaik adalah

algoritma BERT dengan akurasi 96%, diikuti SVM imbalanced class dengan akurasi

95,2% serta SVM-smote dan LSTM dengan akurasi 94,8%. Sementara itu, algoritma

pemodelan topik yang digunakan adalah LDA. Hasil pemodelan topik menggunakan

algoritma LDA untuk sentimen positif dan negatif. kesimpulan topik pada sentimen

positif yakni pengguna merasa aplikasi sangat bagus dan memberikan manfaat yang

besar, serta mudah digunakan Sedangkan dari topik yang muncul pada sentimen negatif

didapatkan kesimpulan yakni pengguna merasa iklan yang muncul sangat mengganggu

dan mengurangi pengalaman pengguna walaupun pengguna merasa aplikasi bagus dan

bermanfaat namun karena terdapat iklan yang sangat mengganggu berpengaruh terhadap

kepuasaan pengguna sehingga memberikan rating rendah.

The rapid development of technology has led to an increasing growth in mobile device
users. This has driven application developers to create various apps. The Learn Quran
Tajwid app is designed for users to learn and understand the recitation of the Quran in
more detail, with accurate audio pronunciation. Users can practice their recitation and
receive corrections from the app. The revenue for Learn Quran Tajwid comes from
subscription services and advertisements. The main source of revenue is the subscription
packages, particularly on the Google Play Store. However, the main revenue source has
been experiencing a decline in monthly growth compared to the previous year. The target
of increasing monthly revenue growth for the Learn Quran Tajwid app on the Google
Play Store from the previous year (year-over-year) was not achieved. Therefore, an
analysis of the root cause was conducted, and it was found that customer satisfaction has
decreased. This research aims to examine the users' perspectives of the Learn Quran
Tajwid app on the Google Play Store through sentiment analysis and topic modelling. A
total of 5100 app reviews were used for the analysis, obtained by scraping user reviews
of the Learn Quran Tajwid app from the Google Play Store. Out of these, 3026 reviews
were used as training data. The training data was manually annotated to determine
positive or negative sentiment, and then pre-processing and text representation using TF
IDF were performed. This study used the NB, SVM, XGBoost, CNN, LSTM, and BERT
algorithms for sentiment classification. The experimental results showed that the BERT
algorithm performed the best with an accuracy of 96%, followed by SVM imbalance class
with 95.2% accuracy, and SVM-SMOTE and LSTM with 94.8% accuracy. As for the
topic modelling algorithm used, it was LDA. The topic modelling results using the LDA
algorithm for positive sentiment and negative sentiment. In conclusion, the topics
identified for positive sentiment indicate that users find the app to be excellent and highly
beneficial, as well as easy to use. On the other hand, from the topics identified for negative
sentiment, it can be concluded that users find the ads to be very disruptive and diminish
the user experience. Despite users perceiving the app as good and useful, the presence of
intrusive ads has a significant impact on user satisfaction, resulting in lower ratings. "

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2023

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Parluhutan, Matthew Tumbur

Ekstraksi Key Moments Otomatis pada Video Perkuliahan di Fasilkom UI Menggunakan Optical Character Recognition dan Topic Modelling = Automatic Key Moments Extraction on Lecture Videos in Fasilkom UI with Optical Character Recognition and Topic Modelling

"Pandemi COVID-19 mengubah pola kehidupan manusia, termasuk sistem perkuliahan yang berubah ke metode daring. Video perkuliahan dengan salindia menjadi salah satu pilihan sarana penyampaian materi kuliah secara daring. Penelitian ini bermaksud menguji keabsahan rancangan sistem yang mampu melakukan segmentasi temporal sesuai topik secara otomatis pada video perkuliahan. Sistem yang diajukan dibagi menjadi tiga sub-sistem yang memanfaatkan teknologi keyframe extraction, optical character recognition (OCR), dan topic modelling. Pertama, video perkuliahan akan diubah menjadi kumpulan keyframe dengan memanfaatkan metode Slide Detector yang dimodifikasi. Selanjutnya, akan dilakukan ekstraksi teks dari frame-frame tersebut menggunakan Tesseract OCR dengan preprocessing tambahan. Akhirnya, BERTopic dengan beragam algoritma clustering dan LDA diuji kemampuannya dalam topic modelling yang berguna untuk mengambil topik yang koheren dari teks tersebut. Penelitian pada tahap keyframe extraction menunjukkan bahwa terdapat peningkatan recall sebesar 0,235-025 dari 0 dan precision sebesar 0,619-0,75 dari 0 pada beberapa video pada Slide Detector termodifikasi. Sebaliknya, penelitian pada tahap OCR menunjukkan bahwa tambahan preprocessing belum bisa membantu meningkatkan performa Tesseract OCR. Pada tahap terakhir, ditemukan bahwa BERTopic lebih unggul daripada LDA dalam menarik topik yang koheren untuk use case penelitian ini. Agglomerative dan KMeans clustering ditemukan lebih optimal untuk kasus video perkuliahan jika dibandingkan dengan metode density-based. Augmentasi data dengan takaran yang sesuai diperlukan untuk mendapatkan hasil sedemikian rupa pada tahap ini. Secara umum, sistem dengan tiga bagian yang diusulkan pada penelitian ini sudah mampu melakukan segmentasi video perkuliahan sesuai tujuan, namun, video perkuliahan bersalindia merupakan dataset yang sangat heterogen dan merancang sebuah sistem yang mampu memanfaatkan dataset tersebut adalah tantangan tersendiri.

The COVID-19 pandemic changed the lifestyle of many people, including university lectures that moved to online delivery. Lecture videos with slides became an option to deliver lecture materials online. This work attempts to show a proof of concept for a system design that is able to automatically segment a lecture video temporally based on the topic. The proposed system is divided into three subsystems that make use of keyframe extraction, optical character recognition (OCR), and topic modelling techniques. First, a lecture video will be converted to a collection of keyframes using a modified Slide Detector technique. Next, those frames will be processed using Tesseract OCR with some additional preprocessing steps to extract text. Lastly, BERTopic with various clustering techniques and LDA will be used for topic modelling to obtain a coherent topic from the text extracted earlier. The research in the keyframe extraction step shows that there is an increase of 0.235-0,5 points from 0 for recall and 0,619-0,75 points from 0 for precision for certain videos using the modified Slide Detector. On the other hand, the research in the OCR step shows that the additional preprocessing is not yet able to help increase the performance of Tesseract OCR. At the last step, BERTopic proves to be better than LDA to obtain the coherent topic for this system's use case. Agglomerative and KMeans clustering is better for lecture videos compared to density-based methods. Appropriate amounts of data augmentation is needed to obtain the best results at this step. Overall, the three-part system in this research is able to segment lecture videos as intended, however, lecture videos with slides is a dataset that is very heterogeneous and designing a system to handle all types of videos is a large challenge."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2023

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Muhammad Irfan Junaidi

Ekstraksi Key Moments Otomatis pada Video Perkuliahan di Fasilkom UI Menggunakan Optical Character Recognition dan Topic Modelling = Automatic Key Moments Extraction on Lecture Videos in Fasilkom UI with Optical Character Recognition and Topic Modelling

"Pandemi COVID-19 mengubah pola kehidupan manusia, termasuk sistem perkuliahan yang berubah ke metode daring. Video perkuliahan dengan salindia menjadi salah satu pilihan sarana penyampaian materi kuliah secara daring. Penelitian ini bermaksud menguji keabsahan rancangan sistem yang mampu melakukan segmentasi temporal sesuai topik secara otomatis pada video perkuliahan. Sistem yang diajukan dibagi menjadi tiga sub-sistem yang memanfaatkan teknologi keyframe extraction, optical character recognition (OCR), dan topic modelling. Pertama, video perkuliahan akan diubah menjadi kumpulan keyframe dengan memanfaatkan metode Slide Detector yang dimodifikasi. Selanjutnya, akan dilakukan ekstraksi teks dari frame-frame tersebut menggunakan Tesseract OCR dengan preprocessing tambahan. Akhirnya, BERTopic dengan beragam algoritma clustering dan LDA diuji kemampuannya dalam topic modelling yang berguna untuk mengambil topik yang koheren dari teks tersebut. Penelitian pada tahap keyframe extraction menunjukkan bahwa terdapat peningkatan recall sebesar 0,235-025 dari 0 dan precision sebesar 0,619-0,75 dari 0 pada beberapa video pada Slide Detector termodifikasi. Sebaliknya, penelitian pada tahap OCR menunjukkan bahwa tambahan preprocessing belum bisa membantu meningkatkan performa Tesseract OCR. Pada tahap terakhir, ditemukan bahwa BERTopic lebih unggul daripada LDA dalam menarik topik yang koheren untuk use case penelitian ini. Agglomerative dan KMeans clustering ditemukan lebih optimal untuk kasus video perkuliahan jika dibandingkan dengan metode density-based. Augmentasi data dengan takaran yang sesuai diperlukan untuk mendapatkan hasil sedemikian rupa pada tahap ini. Secara umum, sistem dengan tiga bagian yang diusulkan pada penelitian ini sudah mampu melakukan segmentasi video perkuliahan sesuai tujuan, namun, video perkuliahan bersalindia merupakan dataset yang sangat heterogen dan merancang sebuah sistem yang mampu memanfaatkan dataset tersebut adalah tantangan tersendiri.

The COVID-19 pandemic changed the lifestyle of many people, including university lectures that moved to online delivery. Lecture videos with slides became an option to deliver lecture materials online. This work attempts to show a proof of concept for a system design that is able to automatically segment a lecture video temporally based on the topic. The proposed system is divided into three subsystems that make use of keyframe extraction, optical character recognition (OCR), and topic modelling techniques. First, a lecture video will be converted to a collection of keyframes using a modified Slide Detector technique. Next, those frames will be processed using Tesseract OCR with some additional preprocessing steps to extract text. Lastly, BERTopic with various clustering techniques and LDA will be used for topic modelling to obtain a coherent topic from the text extracted earlier. The research in the keyframe extraction step shows that there is an increase of 0.235-0,5 points from 0 for recall and 0,619-0,75 points from 0 for precision for certain videos using the modified Slide Detector. On the other hand, the research in the OCR step shows that the additional preprocessing is not yet able to help increase the performance of Tesseract OCR. At the last step, BERTopic proves to be better than LDA to obtain the coherent topic for this system's use case. Agglomerative and KMeans clustering is better for lecture videos compared to density-based methods. Appropriate amounts of data augmentation is needed to obtain the best results at this step. Overall, the three-part system in this research is able to segment lecture videos as intended, however, lecture videos with slides is a dataset that is very heterogeneous and designing a system to handle all types of videos is a large challenge."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2023

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Dinda Sigmawaty

Pemeringkatan Dokumen Secara Temporal dengan Dynamic Embeddings = Temporal Ranking with Dynamic Embeddings

"Saat mencari artikel yang diterbitkan dalam periode waktu yang panjang, pengguna biasanya membutuhkan dokumen yang tidak hanya relevan terhadap topik tetapi juga relevan terhadap waktu. Tesis ini membahas tentang pemeringkatan dokumen dengan konsep waktu atau temporal, di mana dokumen dengan topik dan waktu yang dekat dengan query harus diberikan peringkat yang lebih tinggi. Untuk mengetahui waktu yang sesuai dengan query pengguna, tesis ini mengembangkan teknik pemeringkatan temporal yang diperoleh dari distribusi keterkaitan kata dari waktu ke waktu yang dipelajari pada sebuah arsip berita dalam Bahasa Indonesia. Keterkaitan kata dipelajari menggunakan Dynamic Embeddings yaitu Word2Vec yang dipelajari terpisah dari waktu ke waktu, OrthoTrans-Word2Vec dan Dynamic Bernoulli Embeddings. Dalam menangkap relevansi secara topikal, model yang diusulkan menggunakan Dual Embedding Space Model (DESM) yang dibangun dengan teknik temporal sesuai dengan waktu pembuatan dokumen. Untuk meningkatkan nilai presisi, model tersebut juga menggunakan sebuah klasifikasi temporal yang dipelajari menggunakan Support Vector Machine (SVM) dan Basis Threshold. Skor tertinggi dicapai ketika membangun model menggunakan Word2Vec yaitu 66% pada presisi rata- rata dan 68% pada presisi awal. Model tersebut juga terbukti efektif pada query temporal yang memiliki pola seperti tren, periodisitas, dan musiman.

When searching for articles published over time, users usually require documents that are not only topically relevant but also created during relevant time periods. This thesis studied about document ranking with temporal concept, where documents with topic and time that closely match with the queries should be ranking higher. In order to capturing the time of user query intent, the models developed with temporal ranking technique from distribution of word relatedness over time learned from news archive in Bahasa Indonesia. Word relatedness captured by using Dynamic embeddings, such as Word2Vec learned separately over time, OrthoTrans-Word2Vec dan Dynamic Bernoulli Embeddings. For capturing topical relevance, the proposed model used Dual Embedding Space Model (DESM) in the temporal technique according to document timestamp. The model also combined with temporal classification using Support Vector Machine (SVM) and threshold-based strategy. The highest score was achieved by a model using Word2Vec, which is 66% in average precision and 68% in early precision. The result also showed that the model is effective in capturing temporal patterns such as spikes, periodicity, and seasonality"

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2019

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Arminditya Fajri Akbar

Analisis Persepsi Pengguna terhadap Platform Pembelajaran Daring selama Pandemi COVID-19 menggunakan Teknik Text Mining = User Perception Analysis of Online Learning Platform During the COVID-19 Pandemic Using Text Mining Techniques

"Terbatasnya interaksi sosial di tengah pandemi COVID-19 memaksa perubahan pada proses belajar mengajar secara konvensional menjadi pembelajaran jarak jauh. Peran platform pembelajaran daring diharapkan dapat menjadi sumber belajar yang optimal sehingga kesenjangan capaian belajar dapat diminimalkan. Namun, masyarakat mengalami culture shock karena perubahan tersebut, kesadaran peserta didik yang masih rendah untuk mengevaluasi hasil belajar, dan terdapat kendala terkait dengan akses Internet. Oleh sebab itu, pengukuran tingkat kepuasan pengguna perlu dilakukan untuk menilai efektivitas platform pembelajaran daring dalam menyediakan sumber belajar alternatif dengan menggunakan teknik text mining. Penelitian ini menganalisis persepsi pengguna terhadap layanan platform Ruangguru, Zenius, dan Quipper dengan mengolah data ulasan dari Google Play Store. Teknik text mining yang digunakan ialah pemodelan topik dengan metode Latent Dirichlet Allocation untuk mendefinisikan aspek layanan yang dibahas pada data ulasan. Selain itu, dilakukan analisis sentimen ulasan dari setiap aspek layanan menggunakan algoritma SentiStrengthID. Hasil dari kedua teknik text mining tersebut dikuantifikasi menggunakan metode Net Reputation Score untuk mendapatkan skor kepuasan pengguna sehingga dapat menjadi referensi bagi pihak penyedia platform dalam menentukan prioritas peningkatan layanan. Hasil penelitian menunjukan bahwa aspek promosi dan video berlangganan perlu menjadi fokus perbaikan bagi pihak penyedia platform Ruangguru. Kedua aspek tersebut mendapatkan skor kepuasan terendah sebesar -6,19% dan 15,84%. Sementara itu, aspek mengenai latihan soal dan akun pengguna pada platform Zenius perlu untuk menjadi prioritas perbaikan dengan masing-masing skor kepuasan yang diperoleh sebesar 34,91% dan 44,14%. Terakhir, aspek layanan dari platform Quipper mengenai latihan pembahasan dan registrasi pengguna perlu segera diperbaiki karena mendapatkan skor kepuasan yang kritikal sebesar -54,28% dan -16,66%. Berdasarkan hasil tersebut, didapatkan wawasan yang berguna bagi pihak penyedia untuk mempermudah pengambilan keputusan dalam optimasi platform pembelajaran daring.

Limited social interaction amid the COVID-19 pandemic has forced a change in the conventional learning process to distance learning. Online learning platforms are expected to provide optimal learning resources so that the gap in learning achievement can be minimized. However, the society experienced culture shock due to the change, student awareness of the possibility to evaluate learning strategies and learning outcomes is still low during online learning activities, and there are issues related to Internet access. Therefore, it is necessary to measure the user satisfaction score to assess the effectiveness of online learning platforms in providing alternative learning resources using text mining techniques. This study analyzes user perceptions of Ruangguru, Zenius, and Quipper by exploring review data from Google Play Store. Text mining techniques used are topic modeling by applying Latent Dirichlet Allocation method to define the service aspects discussed in the review data. Additionally, sentiment analysis was carried out to classify the emotional tendency of each user review from every service aspect using SentiStrengthID algorithm. Results from both techniques are quantified using Net Reputation Score method to obtain user satisfaction scores so that it can be a reference for platform providers in determining service improvement priorities. The results of the study reveal that the promotion and subscription video aspects need to be the improvement focus for the Ruangguru platform provider. Both aspects get the lowest satisfaction scores of -6.19% and 15.84%, respectively. Meanwhile, aspects regarding tryout and user account on the Zenius platform need to be a priority for improvement with satisfaction scores obtained of 34.91% and 44.14%, respectively. The last one, service aspects of the tryout and user registration on the Quipper platform need to be improved immediately by the service provider because these aspects get critical satisfaction scores of -54.28% and -16.66%, respectively. Based on these results, useful insights were obtained for providers to facilitate decision-making in optimizing online learning platforms."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2021

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Ignasius Harvey Pratama Gunawijaya

Segmentasi Pelanggan pada Pemasaran Email Berdasarkan Preferensi Topik dari Interaksi Historis = Customer Segmentation on Email Marketing Based on Topic Preferences from Historical Interaction

"Pemasaran email digunakan berbagai industri untuk berkomunikasi dan menjaga hubungan dengan pelanggannya. Untuk menjaga performa pemasaran email dan relevansi, personalisasi untuk setiap individu diterapkan. Personalisasi ini seringkali dibuat dengan data transaksional untuk mencerminkan perilaku dan ketertarikan setiap pelanggan. Akan tetapi, tidak semua industri memiliki jumlah data transaksional yang cukup untuk setiap pelanggan. Salah satu contoh adalah industri perhotelan yang memiliki frekuensi transaksi lebih jarang dibandingkan dengan sektor industri seperti retail dan e-commerce. Tantangan yang muncul adalah bagaimana memanfaatkan data non-transaksional untuk mencari tahu ketertarikan dan preferensi pelanggan. Penelitian ini mengusulkan untuk memodelkan preferensi pelanggan dan membangun segmentasi pelanggan pada pemasaran email berdasarkan topik dari kampanye beserta interaksi historis dengan pelanggannya. Biterm topic model digunakan untuk memodelkan topik judul email yang dianggap sebagai preferensi pelanggan. Segmentasi dibangun dengan menggabungkan topik yang telah dimodelkan dengan interaksi historis pelanggan. Penelitian ini mengevaluasi performa pemodelan topik dan segmentasi menggunakan data kampanye 6 bulan terakhir. Hasil penelitian menunjukkan bahwa 69% pembuka email dalam 6 bulan terakhir memiliki preferensi yang sesuai dengan interaksi historisnya. Segmentasi berdasarkan topik juga mampu meningkatkan performa tingkat buka email sampai hampir 2 kali lipat. Keluaran dari penelitian yang berupa segmentasi berdasarkan topik dapat digunakan oleh pemasar untuk membangun strategi dan mencapai performa pemasaran email yang lebih tinggi.

Email marketing is widely used in many industries to communicate and maintain the relationship with their existing customers. In order to keep high email marketing performance and relevance, personalization is applied for each customer. This personalization is often built using transactional data to reflect each customer’s behavior and interest. However, not all industries may have a sufficient amount of transactional data on each customer. One example would be the hotel industry which has less transaction compared to other industries such as retail and e-commerce. The challenge is how to make use of non-transactional data to discover user interests and preferences. This study proposes to model customer preference and build user segmentation in email marketing based on email topics and its customers’ historical interaction. Biterm topic model is used to model the topics from email subjects which are assumed as customers’ preference. Segmentation is built by combining the topics generated with the customers’ historical interaction. This study evaluates both the topic modeling and segmentation using email campaign data in the last 6 months. The result shows that 69% of email openers in the last 6 months have a matching preference with their historical interaction. Topic-based segmentation can also improve the open rate by almost 2 times on average. The output of the study in the form of topic-based segmentation can be used by email marketers to design new strategies and achieve higher email marketing performance."

Depok: Fakultas Teknik Universitas Indonesia, 2021

T-Pdf

UI - Tesis Membership Universitas Indonesia Library

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian