Hasil Pencarian

Ditemukan 11 dokumen yang sesuai dengan query

Tulus Setiawan

Studi Komparasi Kinerja Analisis Sentimen Bahasa Indonesia Berbasis Large Language Model BERT dan GPT = Comparative Study of Sentiment Analysis Performance of Indonesian Language Based on Large Language Model BERT and GPT

"Indonesia merupakan salah satu negara yang sempat terimbas COVID-19. Hal itu berdampak pada sektor pariwisata, khususnya industri perhotelan di Indonesia. Meskipun begitu, sekarang sektor pariwisata di Indonesia mulai pulih kembali, khususnya untuk industri perhotelan. Badan Pusat Statistik (BPS) mencatat bahwa pada tahun 2023, tingkat penghunian kamar (TPK) hotel bintang bahkan mengalami kenaikan dibandingkan dengan tahun 2022, kenaikan hotel bintang mencapai 51,12%. Dengan meningkatnya permintaan terhadap tingkat hunian hotel, ulasan yang diberikan oleh pelanggan terhadap hotel menjadi hal yang penting untuk dianalisis. Salah satu jenis analisis yang dapat dilakukan terhadap ulasan-ulasan tersebut adalah analisis sentimen untuk mengklasifikasi sentimen yang terkandung dalam ulasan ke dalam kelompok-kelompok sentimen tertentu. Walaupun model-model deep learning seperti Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), dan Gated Recurrent Unit (GRU) atau bahkan model hybrid dan fully-connected layer neural network dengan representasi Bidirectional Encoder Representations from Transformers (BERT) terbukti menghasilkan kinerja yang baik dalam melakukan analisis sentimen, tetapi beberapa masalah yang umumnya dihadapi adalah fleksibilitas, efisiensi waktu, dan sumber daya yang dibutuhkan dalam penggunaannya. Oleh sebab itu, metode GPT berbasis prompt dapat menjadi salah satu solusi untuk permasalahan tersebut. Dengan menggunakan GPT berbasis prompt, pengguna dapat langsung memanfaatkan pengetahuan dan pemahaman bahasa yang telah diperoleh model GPT selama proses pelatihan pada korpus teks yang sangat besar. Hal ini memungkinkan model untuk menghasilkan prediksi sentimen yang akurat tanpa perlu melalui proses pelatihan yang panjang dan kompleks. Penelitian ini menganalisis dan membandingkan kinerja Large Language Model BERT dan GPT sebagai metode untuk analisis sentimen berbahasa Indonesia. Hasil Penelitian menunjukkan bahwa rata-rata kinerja model GPT secara keseluruhan lebih unggul dibandingkan model BERT dengan fully-connected layer neural network (BERT-NN) untuk dataset tiket.com, PegiPegi, dan Traveloka. Secara spesifik, model GPT dengan pendekatan zero-shot memiliki rata-rata kinerja yang paling unggul dibandingkan dengan pendekatan one-shot dan few-shot. Untuk rata-rata kinerja terhadap ketiga dataset tersebut, GPT dengan pendekatan zero-shot memberikan peningkatan sebesar 1,28%, 1,45%, dan 6,2% untuk metrik akurasi, F1-score, dan sensitivity secara berurutan terhadap kinerja BERT-NN. Hasil ini menunjukkan potensi metode GPT berbasis prompt sebagai alternatif yang efisien dan fleksibel secara penggunaan untuk analisis sentimen pada ulasan hotel berbahasa Indonesia.

Indonesia was one of the countries affected by COVID-19. This impacted the tourism sector, particularly the hotel industry in Indonesia. However, the tourism sector in Indonesia is now beginning to recover, especially for the hotel industry. The Central Statistics Agency (BPS) recorded that in 2023, the occupancy rate of star-rated hotels even increased compared to 2022, with the increase reaching 51.12%. With the rising demand for hotel occupancy rates, customer reviews of hotels have become important to analyze. One type of analysis that can be performed on these reviews is sentiment analysis to classify the sentiments contained in the reviews into specific sentiment groups. Although deep learning models such as Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU), or even hybrid models and fully-connected layer neural networks with Bidirectional Encoder Representations from Transformers (BERT) representation have been proven to produce good performance in sentiment analysis, some common problems faced are flexibility, time efficiency, and resources required for their use. Therefore, prompt-based GPT methods can be a solution to these problems. By using prompt-based GPT, users can directly leverage the knowledge and language understanding that the GPT model has acquired during training on a vast text corpus. This allows the model to generate accurate sentiment predictions without going through a long and complex training process. This study analyzes and compares the performance of BERT and GPT Large Language Models as methods for Indonesian language sentiment analysis. The results show that the average overall performance of the GPT model is superior to the BERTmodel with a fully-connected layer neural network (BERT-NN) for datasets from tiket.com, PegiPegi, and Traveloka. Specifically, the GPT model with a zero-shot approach has the most superior average performance compared to the one-shot and few-shot approaches. For the average performance across these three datasets, GPT with a zero-shot approach provides improvements of 1.28%, 1.45%, and 6.2% for accuracy, F1-score, and sensitivity metrics, respectively, compared to BERT-NN performance. These results demonstrate the potential of prompt-based GPT methods as an efficient and flexible alternative for sentiment analysis on Indonesian language hotel reviews."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Aulia Nur Fadhilah

Konstruksi Knowledge Graph pada Dokumen Peraturan Perundang-undangan Indonesia Menggunakan Large Language Model = Knowledge Graph Construction on Indonesian Legal Documents using Large Language Model

"Indonesia merupakan negara hukum yang mengadopsi asas Fictie Hukum. Asas tersebut memandang setiap orang tahu hukum tanpa pengecualian. Penyediaan akses yang mudah terhadap produk hukum merupakan konsekuensi dari hal tersebut. Meski telah tersedia beberapa layanan daring pencarian hukum, baik oleh pemerintah maupun swasta, layanan tersebut belum mampu menangkap relasi intradokumen dan antardokumen dengan baik. Dalam meningkatkan sistem pencarian hukum, terdapat knowledge graph (KG) bernama LexID yang menghadirkan representasi peraturan perundang-undangan Indonesia dalam sebuah graf. KG tersebut dikonstruksi dengan pendekatan rule-based. Namun, pendekatan rule-based tidak mudah beradaptasi dengan perubahan dalam format atau konten dokumen dan memerlukan pemeliharaan berkelanjutan. Penelitian ini mengusulkan pendekatan lain dalam konstruksi LexID. Proses konstruksi LexID dilakukan menggunakan pre-trained large language model (LLM) berupa CodeGemma parameter 7B, Code Llama parameter 7B, dan Phi-3 parameter 7B. Jenis prompt yang digunakan, yaitu prompt kode dan teks dengan variasi 1-shot dan 2-shot, sehingga total terdapat dua belas skenario percobaan. Hasil konstruksi KG kemudian dievaluasi terhadap KG LexID dan diukur menggunakan metrik precision, recall, dan skor F1. Dari hasil konstruksi, didapatkan skor F1 hasil dari prompt teks 1-shot: CodeGemma 0.405, CodeLlama 0.452, dan Phi 0.362; skor F1 hasil dari prompt kode 1-shot: CodeGemma 0.645, CodeLlama 0.567, dan Phi 0.526; skor F1 hasil dari prompt teks 2-shot: CodeGemma 0.572, CodeLlama 0.502, dan Phi 0.386; skor F1 hasil dari prompt kode 2-shot: CodeGemma 0.687, CodeLlama 0.583, dan Phi 0.539.

Indonesia operates under a legal system that adopts the principle of Legal Fiction, which posits that every individual is presumed to be aware of the law without exception. Consequently, providing easy access to legal documents is imperative. Despite the availability of several online legal search services offered by both government and private entities, these services have yet to effectively capture intra-document and inter-document relationships. To enhance the legal search system, a knowledge graph (KG) named LexID has been developed to represent Indonesian legislation in a graph format. This KG has traditionally been constructed using a rule-based approach. However, this approach struggles to adapt to changes in document format or content and necessitates continuous maintenance. This study proposes an alternative approach for the construction of LexID utilizing pre-trained large language models (LLMs), specifically CodeGemma with 7 billion parameters, Code Llama with 7 billion parameters, and Phi-3 with 7 billion parameters. The study employs both code and text prompts, with variations of 1-shot and 2-shot, resulting in a total of twelve experimental scenarios. The constructed KG is then evaluated against the existing LexID KG, using precision, recall, and F1 score metrics. The results of the construction indicate the following F1 scores: for 1-shot text prompts, CodeGemma achieved 0.405, Code Llama 0.452, and Phi 0.362; for 1-shot code prompts, CodeGemma achieved 0.645, Code Llama 0.567, and Phi 0.526; for 2-shot text prompts, CodeGemma achieved 0.572, Code Llama 0.502, and Phi 0.386; and for 2-shot code prompts, CodeGemma achieved 0.687, Code Llama 0.583, and Phi 0.539."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Muhammad Halif

Konstruksi Knowledge Graph pada Dokumen Peraturan Perundang-undangan Indonesia Menggunakan Large Language Model = Knowledge Graph Construction on Indonesian Legal Documents using Large Language Model

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Muhammad Haddad

Konstruksi Knowledge Graph pada Dokumen Peraturan Perundang-undangan Indonesia Menggunakan Large Language Model = Knowledge Graph Construction on Indonesian Legal Documents using Large Language Model

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Muhammad Haddad

Konstruksi Knowledge Graph pada Dokumen Peraturan Perundang-undangan Indonesia Menggunakan Large Language Model = Knowledge Graph Construction on Indonesian Legal Documents using Large Language Model

Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

Unggah3 Universitas Indonesia Library

Muhammad Haddad

Konstruksi Knowledge Graph pada Dokumen Peraturan Perundang-undangan Indonesia Menggunakan Large Language Model = Knowledge Graph Construction on Indonesian Legal Documents using Large Language Model

Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

Unggah3 Universitas Indonesia Library

Muhammad Halif

Konstruksi Knowledge Graph pada Dokumen Peraturan Perundang-undangan Indonesia Menggunakan Large Language Model = Knowledge Graph Construction on Indonesian Legal Documents using Large Language Model

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Muhammad Hanif Pramudya Zamzami

Analisis dan Pengembangan Penalaran Deduktif pada Large Language Model = Analysis and Development of Deductive Reasoning in Large Language Model

"Penalaran deduktif adalah suatu metode berpikir logis di mana seseorang menarik kesimpulan spesifik (hipotesis) berdasarkan premis atau pernyataan umum yang dianggap benar dengan menerapkan aturan inferensi logika. Aturan inferensi logika adalah prinsip-prinsip logika yang memungkinkan seseorang untuk mengambil hipotesis yang absah dari premis yang diberikan. Meskipun penalaran deduktif memiliki keunggulan pada penalaran yang absah, manusia cenderung membuat kesalahan dalam bernalar deduktif. Salah satu model bahasa untuk penalaran deduktif adalah Natural Logic (NatLog), yaitu model berbasis machine learning yang dilatih untuk melakukan klasifikasi kelas dari hubungan persyaratan antar kalimat. Namun, model memiliki keterbatasan pada rentang kalimat yang panjang. Di sisi lain, Large Language Model (LLM) seperti Generative Pre-trained Transformer (GPT) telah menunjukkan performa yang baik dalam tugas penalaran deduktif, terutama dengan menggunakan metode Chain of Thought (CoT). Namun, metode CoT masih menimbulkan masalah halusinasi dan inkonsistensi dari langkah perantaranya, yang berujung pada konklusi akhir yang tidak absah. Metode Chain of Thought - Self-Consistency (CoT-SC) merupakan pengembangan dari metode CoT yang bertujuan untuk meningkatkan kemampuan penalaran pada LLM. Dalam metode CoT-SC, CoT dijalankan beberapa kali untuk menghasilkan beberapa sampel jawaban. Setelah itu, dilakukan operasi modus, yaitu pemilihan jawaban yang paling sering muncul di antara sampel-sampel yang dihasilkan, untuk menentukan jawaban akhir. Jawaban dengan frekuensi kemunculan terbanyak dianggap sebagai jawaban yang paling konsisten dan akurat. Tujuan dari penelitian ini adalah untuk mengimplementasikan dan menganalisis kemampuan metode CoT-SC pada model GPT dalam menyelesaikan tugas penalaran deduktif. Penelitian ini akan mengevaluasi kemampuan penalaran deduktif pada model GPT menggunakan tiga sumber data yang merepresentasikan tiga domain tugas penalaran deduktif yang berbeda, yaitu ProntoQA, ProofWriter, dan FOLIO. Setelah itu, akan dilakukan analisis perbandingan performa LLM berbasis metode CoT-SC dengan manusia dalam menyelesaikan tugas penalaran deduktif. Hasil penelitian menunjukkan bahwa metode CoT-SC menunjukkan performa akurasi yang baik dalam mayoritas tugas penalaran deduktif serta LLM GPT dengan metode CoT-SC mengungguli 1 dari 3 domain tugas penalaran deduktif. Hasil ini menunjukkan model GPT berbasis metode CoT-SC memiliki potensi dalam tugas penalaran deduktif.

Deductive reasoning is a method of logical thinking in which one draws specific conclusions (hypotheses) based on general premises or statements that are considered true by applying the rules of logical inference. Rules of logical inference are principles of logic that allow one to derive valid hypotheses from given premises. Although deductive reasoning has the advantage of valid reasoning, humans tend to make mistakes in deductive reasoning. One of the language models for deductive reasoning is Natural Logic (NatLog), which is a machine learning-based model trained to perform class classification of conditional relations between sentences. However, the model has limitations on long sentence ranges. On the other hand, Large Language Models (LLMs) such as Generative Pre-trained Transformer (GPT) have shown good performance in deductive reasoning tasks, especially by using the Chain of Thought (CoT) method. However, the CoT method still raises the problem of hallucinations and inconsistencies of the intermediate steps, leading to invalid final conclusions. The Chain of Thought - Self-Consistency (CoT-SC) method is a development of the CoT method that aims to improve reasoning ability in LLM. In the CoT-SC method, CoT is run several times to produce several sample answers. After that, a mode operation is performed, which is the selection of the most frequently occurring answer among the generated samples, to determine the final answer. The answer with the highest frequency of occurrence is considered the most consistent and accurate answer. The purpose of this study is to implement and analyze the ability of the CoT-SC method on the GPT model in solving deductive reasoning tasks. This study will evaluate the deductive reasoning ability of the GPT model using three data sources representing three different deductive reasoning task domains, namely ProntoQA, ProofWriter, and FOLIO. After that, a comparative analysis of the performance of LLM based on the CoT-SC method with humans in solving deductive reasoning tasks. These results indicate the GPT model based on the CoT-SC method has a potential in deductive reasoning tasks."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Muhammad Naufal Faza

Pengembangan Chatbot Berbasis Komodo-7B dengan Low-Rank Adaptation dan Retrieval-Augmented Generation untuk Pemahaman dan Tanya Jawab pada Dokumen PDF Bahasa Indonesia = Komodo-7B Based Chatbot Development with Low-Rank Adaptation and Retrieval-Augmented Generation for Understanding and Question-Answering on Indonesian PDF Documents

"Penelitian ini bertujuan untuk mengembangkan sistem chatbot yang mampu menjawab pertanyaan seputar akademik Teknik Komputer UI. Sistem ini memanfaatkan teknologi Large Language Model (LLM) Komodo-7B yang telah di-fine-tuning dengan teknik Low-Rank Adaptation (LoRA) dan diintegrasikan dengan Retrieval Augmented Generation (RAG). Dataset Ultrachat yang diterjemahkan ke Bahasa Indonesia digunakan untuk fine-tuning model Komodo-7B, sementara dokumen PDF Kurikulum Teknik Komputer UI 2020 v4 digunakan sebagai sumber informasi untuk model RAG.

Pengujian performa model Komodo-7B menunjukkan bahwa LoRA efektif dalam meningkatkan kemampuan model dalam memahami dan menghasilkan teks percakapan Bahasa Indonesia. Namun, pengujian performa chatbot menggunakan dua dataset pertanyaan, yaitu dataset custom yang dihasilkan menggunakan Giskard dan API ChatGPT, dan dataset Fathurrahman Irwansa yang telah diadaptasi, menunjukkan bahwa sistem chatbot masih memiliki ruang untuk peningkatan. Tingkat akurasi yang rendah pada kedua dataset (32% pada dataset custom dan 24,1% pada dataset Fathur) mengindikasikan bahwa sistem retrieval yang digunakan kurang akurat dalam menemukan konteks yang relevan. Meskipun demikian, ketika model RAG dapat mengambil konteks yang relevan, model Komodo-7B menunjukkan akurasi yang cukup tinggi (80% pada dataset custom dan 91,29% pada dataset Fathur, dihitung dari jumlah ketika kedua Komodo-7B dan konteks benar, kemudian dibagi dengan jumlah ketika konteks benar).

Hasil penelitian menunjukkan bahwa model Komodo-7B memiliki potensi yang baik untuk digunakan pada sistem chatbot jika dikombinasikan dengan sistem retrieval yang lebih akurat. Penelitian ini memberikan kontribusi dalam pengembangan sistem chatbot berbasis LLM untuk menjawab pertanyaan seputar akademik, dan membuka peluang untuk penggunaan yang lebih luas di lingkungan Universitas Indonesia.

This research aims to develop a chatbot system capable of answering questions regarding the academic curriculum of Computer Engineering at Universitas Indonesia. The system utilizes the Komodo-7B Large Language Model (LLM), fine-tuned with Low-Rank Adaptation (LoRA) and integrated with Retrieval Augmented Generation (RAG). The Ultrachat dataset, translated into Indonesian, is used for fine-tuning the Komodo-7B model, while the 2020 v4 Computer Engineering Curriculum PDF document serves as the information source for the RAG model. Performance evaluation of the Komodo-7B model demonstrates that LoRA effectively enhances the model's ability to understand and generate Indonesian text. However, chatbot performance testing using two question datasets, a custom dataset generated using Giskard and the ChatGPT API, and the Fathur dataset adapted from prior research, reveals that the chatbot system still has room for improvement. The low accuracy on both datasets (32% on the custom dataset and 24.1% on the Fathur dataset) indicates that the retrieval system employed is not sufficiently accurate in finding relevant context. Nevertheless, when the RAG model successfully identifies relevant context, the Komodo-7B model exhibits relatively high accuracy (80% on the custom dataset and 91.29% on the Fathur dataset, calculated from the total of when Komodo-7B and the context are both correct, then divided by the total of when the context is correct). The research findings suggest that the Komodo-7B model holds significant potential for chatbot systems when combined with a more accurate retrieval system. This study contributes to the development of LLM-based chatbot systems for answering academic-related questions and opens up opportunities for broader applications within Universitas Indonesia."

Depok: Fakultas Teknik Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Muhammad Haddad

Konstruksi Knowledge Graph Pada Dokumen Peraturan Perundang-undangan Indonesia Menggunakan Large Language Model (LLM) = Knowledge Graphs Construction on Indonesian Legal Documents using Large Language Model (LLM)

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2024

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

<< 1 2 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian