Hasil Pencarian

Ditemukan 50723 dokumen yang sesuai dengan query

Eryawan Deise Ulul

Implementasi hierarchical clustering menggunakan k-mer sparse matrix untuk menganalisis kekerabatan virus mers-cov = Implementation of hierarchical clustering using k-mer sparse matrix to analyze mers-cov genetic relationship / Eryawan Deise Ulul

"[ABSTRAK

Hierarchical clustering merupakan metode yang efektif dalam membentuk pohon

filogenetik dengan mengetahui matriks jarak antar barisan DNA. Salah satu cara

untuk membuat matriks jarak yaitu dengan cara menggunakan metode -mer.

Kelebihan dari metode -mer yaitu lebih efisien dalam segi waktu. Langkahlangkah

dalam membuat matriks jarak dengan metode -mer dimulai dengan

membentuk -mer sparse matrix dari masing barisan DNA. Selanjutnya,

membentuk -mer singular value vector. Pada tahap akhir yaitu menghitung jarak

antar vektor. Pada tesis ini akan dilakukan analisis terhadap barisan DNA MERSCoV

dengan mengimplementasi Hierarchical clustering menggunakan -mers

sparse matrix sehingga dapat diketahui leluhur dari masing-masing barisan DNA

MERS-CoV.

ABSTRACT

Hierarchical clustering is an effective method in creating phylogenetic by

knowing the distance matrix between DNA sequence. One of methods to make the

distance matrix use -mer method. -mer is more efficient than others. The steps

to make distance matrix using -mer method starts from creating -mer sparse

matrix. Then, creating -mer singular value vector. The last steps is counting

distance each vectors. This thesis will analyze the sequence of DNA MERS-CoV

by implementing Hierarchical clustering using k-mers sparse matrix so that will

be known the ancestor of each sequence of DNA MERS-CoV., Hierarchical clustering is an effective method in creating phylogenetic by

knowing the distance matrix between DNA sequence. One of methods to make the

distance matrix use -mer method. -mer is more efficient than others. The steps

to make distance matrix using -mer method starts from creating -mer sparse

matrix. Then, creating -mer singular value vector. The last steps is counting

distance each vectors. This thesis will analyze the sequence of DNA MERS-CoV

by implementing Hierarchical clustering using k-mers sparse matrix so that will

be known the ancestor of each sequence of DNA MERS-CoV.]"

2015

T44260

UI - Tesis Membership Universitas Indonesia Library

Muhammad Naufal Luthfi

Clustering Daerah Bencana Alam di Indonesia Dengan Menggunakan Metode Hierarchical Clustering dan Fuzzy C-Means = Clustering of Natural Disaster Areas in Indonesia Using Hierarchical Clustering and Fuzzy C-Means Methods

"Peradaban yang terus berkembang telah membuat konflik antara manusia dan lingkungan menjadi semakin parah sehingga menyebabkan banyak terjadinya bencana alam. Banyak negara yang terdampak oleh bencana alam dan salah satunya adalah Indonesia. Kondisi dan letak geografis Indonesia menyebabkan banyak terjadinya bencana alam di Indonesia. Oleh karena itu, perlu dilakukan pengelompokan daerah bencana alam di Indonesia untuk mengetahui daerah yang paling sering terkena bencana alam. Metode clustering dapat digunakan untuk mengetahui daerah tersebut. Dari studi literatur yang telah dilakukan, belum ada penelitian yang menggunakan metode hierarchical clustering dan fuzzy c-means untuk clustering daerah bencana alam di Indonesia. Maka dari itu, tujuan dari penelitian ini adalah mengklasifikasi daerah yang sering mengalami bencana alam di Indonesia dengan menggunakan metode hierarchical clustering dan fuzzy c-means. Data yang digunakan dalam penelitian ini adalah data bencana alam di Indonesia dari tahun 2019 hingga 2023. Variabel yang digunakan adalah jumlah kebakaran hutan dan lahan, banjir, cuaca ekstrem, gelombang pasang, tanah longsor, kekeringan, erupsi gunung api, dan gempa bumi di setiap kabupaten yang terdampak bencana alam. Hasil clustering menunjukan terdapat 66 daerah yang sering mengalami banjir, 45 daerah yang sering mengalami kebakaran hutan dan gelombang pasang, dan 30 daerah yang sering mengalami cuaca ekstrem, tanah longsor, kekeringan, erupsi gunung api, dan gempa bumi.

The continuously evolving civilization has exacerbated the conflict between humans and the environment, leading to increasingly severe natural disasters. Many countries are affected by natural disasters, and one of them is Indonesia. Indonesia's conditions and geographic location contribute to the occurrence of numerous natural disasters in the country. Therefore, it is necessary to classify areas prone to natural disasters in Indonesia to identify the most frequently affected regions. Clustering methods can be used to determine these areas. From the literature review conducted, there has been no research utilizing hierarchical clustering and fuzzy c-means methods for clustering areas prone to natural disasters in Indonesia. Therefore, the aim of this research is to classify areas that frequently experience natural disasters in Indonesia using hierarchical clustering and fuzzy c-means methods. The data used in this research is the natural disaster data in Indonesia from 2019 to 2023. The variables used include the number of forest and land fires, floods, extreme weather events, tidal waves, landslides, droughts, volcanic eruptions, and earthquakes in each disaster-affected district. The clustering results indicate that there are 66 regions frequently experiencing floods, 45 regions often experiencing forest fires and tidal waves, and 30 regions commonly facing extreme weather, landslides, droughts, volcanic eruptions, and earthquakes."

Jakarta: Fakultas Teknik Universitas Indonesia, 2024

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Banjarnahor, Evander

Analisis Kekerabatan pada Barisan DNA SARS-Cov-2 Berdasarkan Pembentukan Pohon Filogenetik dengan Metode Hierarchical dan K-Means Clustering Menggunakan Multiple Encoding Vector dan K-Mer = Implementation of Hierarchical and K-Means Clustering Methods Using Multiple Encoding Vector in Analyzing Kinship in SARS-Cov-2 DNA Sequences

"Berdasarkan data WHO pada pertengahan Juli 2021 lebih dari 185,2 juta orang di seluruh dunia terinfeksi virus corona atau Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Virus ini menyerang penapasan manusia yang dapat mengakibatkan infeksi paru-paru pada manusia dan bahkan dapat menyebabkan kematian. Tercatat bahwa lebih dari 4 juta orang di seluruh dunia meninggal akibat terinfeksi virus corona. Di Indonesia sendiri pada pertengahan Juli 2021 tercatat lebih dari 2,4 juta orang ternfeksi virus corona dan lebih dari 65,4 ribu orang meninggal akibat terinfeksi virus corona. Berdasarkan data tersebut, perlu dilakukan analisis kekerabatan virus SARS-CoV-2 untuk mengurangi penyebaran dan memberikan batasan sosial dari negara satu dengan negara lainnya. Identifikasi kekerabatan dari virus covid-19 dan penyebarannya dapat dilakukan dengan cara pembentukan pohon filogenetik dan clustering. Pada penelitian ini pohon filogenetik akan dibangun berdasarkan metode Hierarchical Clustering dengan menggunakan metode Multiple Encoding Vector dan K-Mer berdasarkan translasi DNA kodon menjadi asam amino. Jarak Euclidean akan digunakan untuk menentukan matriks jarak. Penelitian ini selanjutnya menggunakan metode K- Means Clustering untuk melihat penyebarannya, dimana nilai k ditentukan dari jumlah centroid yang dihasilkan dari metode Hierarchical Clustering. Penelitian ini mengambil sampel barisan DNA SARS-CoV-2 dari beberapa negara yang tertular. Dari hasil simulasi, nenek moyang SARS-CoV-2 berasal dari China. Hasil analisis juga menunjukkan bahwa leluhur covid-19 yang paling dekat dengan Indonesia berasal dari India, Australia dan Spanyol. Selain itu dari hasil simulasi dihasilkan bahwa barisan DNA SARS-CoV-2 terdiri dari 9 cluster dan cluster keenam adalah kelompok yang memiliki anggota paling banyak. Hasil analisis juga menunjukkan bahwa metode ini sangat opitimal dalam pengelompokan data dengan nilai 97.4%.

Based on WHO data in middle of July 2021, Coronavirus or Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is infecting more than 185.2 million people worldwide. The virus attacks human breathing, which can cause lung infections and can even cause death. More than 4 million people worldwide have died due to being infected with the coronavirus. In Indonesia alone, in mid-July 2021, there were more than 2.4 million people infected with the corona virus and more than 65.4 thousand people died from being infected with the corona virus. Based on those covid-19 survivor data, it is necessary to carry out a kinship analysis of the coronavirus to reduce its spreading. Identification of the kinship of the covid- 19 virus and its spread can be done by forming a phylogenetic tree and clustering. This study uses the Multiple Encoding Vector method and K-mer based on translation DNA codon to amino acid in analyzing sequences and Euclidean Distance to determine the distance matrix. This research will then use the Hierarchical Clustering method to determine the number of initial centroids and cluster, which will be used later by the K-Means Clustering method kinship in SARS-CoV-2 DNA sequence. This study took samples of DNA sequences of SARS-CoV-2 from several infected countries. From the simulation results, the ancestors of SARS-CoV-2 came from China. The results of the analysis also show that the closest ancestors of covid-19 to Indonesia came from India, Australia and Spain. In addition, the ancestors of SARS-CoV-2 came from China. The SARS- CoV-2 DNA sequence is also consisted of 9 clusters, and the sixth cluster is the group that has the most members. The results also show that this method is very optimal in a grouping of data with a value of 97.4%."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2021

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Lellyana Juliet Wantania

Pemanfaatan Algoritma Clustering Untuk Penyusunan Pola Karier Jabatan Struktural Berbasis Kompetensi: Studi Kasus Pemerintah Provinsi Sulawesi Utara = The Use of Clustering Algorithm to Form Career Patterns in Structural Level Based on Competencies: A Case Study of North Sulawesi Provincial Government

"Penerapan sistem merit telah diamanatkan dalam Undang-undang Nomor 5 Tahun 2014 Tentang Aparatur Sipil Negara yang bertujuan untuk memastikan jabatan pada instansi pemerintah diduduki oleh pegawai dengan persyaratan kualifikasi dan kompetensi terpenuhi. Badan Kepegawaian Daerah Provinsi Sulawesi Utara sebagai instansi yang memiliki kewenangan dibidang kepegawaian, berkewajiban serta bertanggung jawab untuk menciptakan sistem pemerintahan yang bersih termasuk penerapan sistem merit. Pengisian jabatan struktural telah dilakukan sesuai dengan prosedur, namun masih mengalami kesulitan dalam hal pemetaan jabatan. Karenanya penelitian ini dilakukan untuk menentukan pola karier jabatan struktural berbasis kompetensi di lingkungan Pemerintah Provinsi Sulawesi Utara. Metode data mining teknik clustering digunakan pada penelitian ini yaitu dengan menggunakan metode agglomerative hierarchical clustering melalui metode Ward’s dan Orange sebagai perangkat lunaknya. Melalui tahapan Knowledge Discovery in Database (KDD) pengelompokan jabatan dilakukan berdasarkan karakteristik kompetensi teknis sebagai hasil dari standar kompetensi jabatan. Hasil pengelompokan jabatan sebanyak tujuh cluster yang telah divalidasi oleh pejabat terkait. Pola karier yang terbentuk memperlihatkan dua jenis arah pergerakan yaitu arah promosi dan mutasi/rotasi. Arah promosi untuk jabatan dalam satu cluster dengan tingkat jabatan lebih tinggi dari jabatan sebelumnya, sedangkan arah mutasi/rotasi untuk jabatan dalam satu cluster dengan tingkat jabatan yang sama dari jabatan sebelumnya.

The application of the merit system has been mandated in Law Number 5 of 2014 concerning State Civil Apparatus which aims to ensure that positions in government agencies are occupied by employees with the qualification and competency requirements being met. The Regional Civil Service Agency of North Sulawesi Province as an agency that has the authority in the field of personnel, is obliged and responsible for creating a clean government system, including the implementation of a merit system. The filling of structural positions has been carried out in accordance with procedures, but there are still difficulties in terms of position mapping. Therefore, this research was conducted to determine the career pattern of competency-based structural positions within the North Sulawesi Provincial Government. The data mining method of clustering technique used in this study is by using the agglomerative hierarchical clustering method through the Ward's and Orange methods as the software. Through the Knowledge Discovery in Database (KDD) stages, job grouping is carried out based on the characteristics of technical competence as a result of job competency standards. The results of the grouping of positions are seven clusters that have been validated by the relevant officials. The career pattern formed shows two types of movement directions, namely the direction of promotion and transfer/rotation. The direction of promotion for positions in a cluster with a higher level of position than the previous position, while the direction of mutation/rotation for positions in a cluster with the same level of position from the previous position"

Jakarta: Fakultas Ilmu Komputer Universitas Indonesia, 2021

TA-pdf

UI - Tugas Akhir Universitas Indonesia Library

Bayu Permata Negara

Cluster Ensemble pada Data Campuran dalam Pengelompokan Sekolah Menengah Pertama di Provinsi Jawa Barat = Cluster Ensemble Based Mixed Data Clustering of Junior High School in West Java Province

"Analisis kelompok adalah metode multivariat yang bertujuan mengelompokkan pengamatan berdasarkan karakteristiknya. Salah satu metode analisis pengelompokan adalah metode cluster ensembel dengan pengelompokan dilakukan dengan satu metode berulang kali hingga diperoleh hasil yang lebih baik dibandingkan jika dilakukan satu kali. Penelitian ini mencoba menggunakan Cluster Ensemble Based Mixed Data Clustering (CEBMDC), yaitu metode pengelompokan yang biasa dilakukan untuk data dengan variabel campuran yaitu numerik dan kategorik. Tahap awal dalam metode ini yaitu membagi data awal menjadi data dengan hanya variabel-variabel numerik dan data dengan hanya variabel-variabel kategorik. Data yang telah dipisahkan berdasarkan jenis variabelnya kemudian dikelompokan menggunakan metode yang sesuai secara simultan. Hasil pengelompokan ini menjadi data baru dengan dua variabel kategorik yaitu hasil pengelompokan dengan variabel numerik dan hasil pengelompokan dengan variabel kategorik. Data baru dengan dua variabel kategorik ini kemudian dilakukan proses pengelompokan. Metode pengelompokan untuk data dengan variabel numerik adalah metode Hierarchical Agglomerative Clustering. Metode clustering untuk data kategorik adalah ROCK (RObust Clustering using linKs) dan K-medoids/PAM (Partition Around Medoids). Penelitian ini membandingkan hasil pengelompokan ROCK dan K-medoids. Pengelompokan dilakukan pada data mengenai sarana dan prasarana sekolah yang diambil dari 5.094 SMP yang ada di Jawa barat. Metode pengelompokan dengan kinerja terbaik pada penelitian ini adalah Ensemble K-medoids berdasarkan rasio antara simpangan baku di dalam kelompok (Â¬SW) dan simpangan baku antar kelompok (SB) terkecil. Penelitian ini menghasilkan 3 kelompok yang mencerminkan kondisi sekolah-sekolah pada jenjang SMP di Jawa Barat.

Clustering analysis is a multivariate method that aims to classify observations based on their characteristics. One method of clustering analysis is the ensemble clustering method in which the grouping is done using a method repeatedly until better results are obtained than if it is done once. This study uses the Cluster Ensemble Based Mixed Data Clustering (CEBMDC), which is a grouping method that commonly used for data with numerical and categorical variables. The first step in this method is to divide the initial data into two parts, that is data with only numerical variables and data with categorical variables. After data has been separated based on the types of variables, and then clustering using the appropriate method is conducted simultaneously. The results of these two clustering method become a new data with two categorical variables, namely the results of clustering with numeric variables and the results of clustering with categorical variables. The new data with two categorical variables are then carried out the clustering process. The clustering method for data with numerical variables is the Hierarchical Agglomerative Clustering method. Clustering methods for categorical data are ROCK (RObust Clustering using linKs) and K-medoids / PAM (Partition Around Medoids). This study compares the results of ROCK and K-medoids clustering. The study was conducted on data of school facilities and infrastructure taken from 5094 junior high schools in West Java. The best performance grouping method in this study is the Ensemble K-medoids based on the ratio between the standard deviation in the group (SW) and the smallest standard inter-group (SB) deviation. This study produced 3 groups that reflect the condition junior high schools in West Java."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2020

T-pdf

UI - Tesis Membership Universitas Indonesia Library

Azmi Jundan Taqiy

Optimasi Rute Pelayaran Perintis di Wilayah NTT-Maluku Barat Daya Menggunakan Density-Based Spatial Clustering of Applications with Noise dan Travelling Salesman Problem = Optimization of Pelayaran Perintis in The NTT-Southwest Maluku Region Using Density-Based Spatial Clustering of Applications with Noise and Travelling Salesman Problem

"Indonesia sebagai negara kepulauan memiliki lebih dari 17 ribu pulau. Hal ini menyebabkan adanya tantangan tersendri untuk mewujudkan konektivitas antar pulaunya, terutama pada daerah terpencil dan tertinggal. Pelayaran perintis merupakan pelayaran yang disubsidi oleh pemerintah Indonesia dengan tujuan utama meningkatkan perekonomian di daerah terpencil dan tertinggal. Namun saat ini, kinerja pelayaran perintis masih belum optimal untuk mencapai tujuan tersebut. Hal tersebut ditandai dengan lamanya round voyage suatu trayek yang dapat mencapai 14 hari serta rendahnya capaian target voyage pelayaran perintis. Oleh karena itu, perlu adanya evaluasi serta efisiensi rute pelayaran perintis. Salah satu yang dapat dilakukan untuk meningkatkan efisiensi rute pelayaran perintis adalah dengan melakukan re-routing trayek pelayaran perintis. Penelitian ini melakukan re-routing pelayaran perintis di wilayah NTT-Maluku Barat Daya dengan pertama melakukan clustering menggunakan DBSCAN (Density-Based Spatial Clustering of Applications with Noise) serta optimasi dengan pendekatan TSP (Travelling Salesman Problem). Hasil yang didapatkan adalah terdapat pengurangan dari rata-rata jarak tempuh trayek pelayaran perintis sebesar 55% (dari 1276 NM menjadi 569,3 NM) serta pengurangan angka rata-rata lama round voyage trayek sebesar 74% (dari 13,3 hari menjadi 3,5 hari). Selain itu, terjadi penurunan ketimpangan antar trayeknya yang dilihat dari nilai jangkauan (range) dari jumlah pelabuhan, jarak tempuh, serta lama round voyage pada trayek pelayaran perintis di wilayah NTT-Maluku Barat Daya.

Indonesia, as an archipelagic country, has more than 17,000 islands. This causes challenges in realizing inter-island connectivity, especially in remote and underdeveloped areas. Pelayaran Perintis is a shipping program that the Indonesian government subsidizes to improve the economy in remote and underdeveloped areas. However, the performance of Pelayaran Perintis is still not optimal for achieving this goal. This is indicated by the length of the round voyage of a route that can reach 14 days and the low achievement of the Pelayaran Perintis voyage target. Therefore, there is a need for evaluation and efficiency of Pelayaran Perintis routes. One thing that can be done to increase the efficiency of Pelayaran Perintis routes is by re-routing Pelayaran Perintis routes. This study re-routes Pelayaran Perintis in the NTT-Maluku Southwest region by first clustering using DBSCAN (Density-Based Spatial Clustering of Applications with Noise) and optimization with the TSP (Travelling Salesman Problem) approach. The results obtained are a reduction in the average mileage for Pelayaran Perintis routes by 55% (from 1276 NM to 569.3 NM) and a reduction in the average length of round voyage routes by 74% (from 13.3 days to 3, 5 days). In addition, there has been a decrease in inequality between routes, which can be seen from the range value of the number of ports, distance traveled, and round voyage length on Pelayaran Perintis routes in the NTT-Southwest Maluku region."

Depok: Fakultas Teknik Universitas Indonesia, 2022

S-pdf

UI - Skripsi Membership Universitas Indonesia Library

Catur Adi Nugroho

Perbandingan kinerja algoritma agglomerative hierarchical pada document clustering

"Laporan Tugas Akhir ini berisi mengenai penelitian yang dilakukan oleh penulis dalam membandingkan kinerja beberapa algoritma, yang tergolong ke dalam agglomerative hierarchical, dalam hal melakukan clustering dokumen untuk mendapatkan solusi hierarchical cluster. Algoritma yang diperbandingkan adalah algoritma single link, complete link, dan average. Proses perbandingan dilakukan berdasarkan kualitas cluster yang dihasilkan pada sejumlah dataset. Hasil penelitian menunjukkan bahwa algoritma average merupakan algoritma yang terbaik dalam menghasilkan solusi hierarchical cluster, diikuti oleh algoritma single link, dan algoritma complete link.

Penelitian ini juga melakukan penerapan teknik dalam feature selection untuk melihat seberapa besar efisiensi yang bisa diperoleh tanpa harus mengurangi kualitas solusi cluster yang dihasilkan. Teknik feature selection yang dipergunakan meliputi pembatasan nilai Document Frequency dan Information Gain. Efisiensi yang dilakukan oleh kedua teknik ini adalah melakukan pemilihan kata-kata yang penting saja yang diikutsertakan dalam proses clustering. Penelitian ini mencoba melihat seberapa besar efisiensi yang dapat diperoleh masing-masing teknik dan kemudian membandingkannya satu sama lain. Hasil penelitian menunjukkan bahwa kedua teknik baik pembatasan nilai Document Frequency dan Information Gain mampu melakukan efisiensi pada titik-titik reduksi yang sudah ditetapkan yaitu sebesar 10%-90% dari jumlah kata unik yang ada tanpa kualitas yang berkurang. Selain itu, hasil penelitian menunjukkan bahwa kedua teknik ini sama efektifnya dalam mereduksi dimensi dari dataset yang dipergunakan."

Depok: Universitas Indonesia, 2007

S-Pdf

UI - Skripsi Membership Universitas Indonesia Library

Diyah Septi Andryani

Implementasi hybrid clustering menggunakan algoritma fuzzy c-means dan algoritma divisive untuk menganalisis kekerabatan dna human papillomavirus penyebab kanker serviks = The implementation of hybrid clustering using fuzzy c means algorithm and divisive algorithm for analysing dna human papillomavirus cause of cervical cancer

"Clustering bertujuan untuk mengklasifikasikan pola yang berbeda ke dalam kelompok yang disebut cluster. Analisis gen dengan menggunakan metode clustering dinilai lebih akurat dibandingkan analisis nukleotida menggunakan penyejajaran DNA. Hybrid clustering pada tesis ini mengkombinasikan algoritma fuzzy c-means dan algoritma divisive mampu meningkatkan keakurasian jika dibandingkan pendekatan pengelompokan partitional tradisional. Algoritma divisive akan dijalankan pada step kedua setelah hasil clustering yang diperoleh dari pengelompokan partisi fuzzy c-means.

Penentuan jumlah cluster terbaik ditentukan dari nilai Indeks Davies Bauldin yang paling minimum. Sebanyak 1252 barisan DNA HPV Human papillomavirus diperoleh dari Genbank NCBI dengan proses melakukan ekstraksi ciri DNA, selanjutnya dilakukan normalisasi. Proses ekstraksi ciri, normalisasi, dan penerapan algoritma partisi fuzzy c-means dan divisive dalam metode hybrid clustering menggunakan bantuan program open source.

Pada hasil hybrid clustering level awal diperoleh jumlah cluster optimum sebanyak 3 cluster dengan nilai Indeks Davies Bouldin paling minimum adalah 0.9715919. Pada level ke-2 clustering didapatkan cluster ke-1 terbagi atas 9 sub cluster dengan nilai IDB minimum adalah 0.8909797. Cluster ke-2 terbagi atas 2 sub cluster dengan nilai IDB minimum adalah 0.7650508. Cluster 3 terbagi atas 2 sub cluster dengan nilai IDB minimum adalah 0.9112528. Nilai IDB pada level kedua selalu lebih kecil dibanding nilai IDB pada level 1. Hal ini mengindikasikan bahwa hybrid clustering memberikan hasil yang lebih baik terhadap hasil clustering.

Clustering aims to classify the different patterns into groups called clusters. Analysis gene by using clustering method is considered more accurate than analysis of nucleotide using DNA alignment. In this thesis, hybrid clustering algorithm which combines fuzzy c means and algorithm divisive will be improve accuracy when compared to partitional clustering. Divisive algorithms will applied on second level after clustering partition using fuzzy c means.
To find the best number of clusters is determined using the minimum value of Davies Bouldin Index DBI of the cluster results. The data is 1252 sequences of HPV DNA sequences obtained from Gen Bank Database in the National Centre for Biotechnology Information NCBI at http www.ncbi.nlm.nih.gov in FASTA format. The data is converted into numerical form through feature extraction using n mers frequency.
The results on first level hybrid clustering obtained the optimum cluster divided into three clusters with the value of the minimum Davies Bouldin Index is 0.9715919. Morever, DBI values after implementing the second step of clustering are always producing smaller IDB values compare to the results of using first step clustering only. This condition indicates that the hybrid approach in this study produce better performance of the cluster results, in term its DBI values."

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2017

T47171

UI - Tesis Membership Universitas Indonesia Library

Hengki Muradi

Penerapan metode pengelompokan hierarchical ordered partitioning and collapsing hybrid (Hopach) untuk menganalisis kekerabatan virus ebola = Application of hierarchical ordered partitioning and collapsing hybrid method to analyzing phylogenetically on ebola virus

"[Salah satu tujuan dalam studi ekpresi gen (DNA/Protein) adalah menemukan subbagian

yang penting secara biologis dan kelompok-kelompok dari gen-gen. Pengelompokan gen tersebut dapat dilakukan dengan metode hirarki maupun metode partisi. Kedua metode pengelompokan dapat dikombinasikan, dimana

dilakukan fase partisi dan hirarki secara bergantian, metode ini dikenal dengan metode Hopach. Tahap partisi dapat dilakukan dengan metode PAM, SOM, atau K-Means. Proses partisi dilanjutkan dengan proses Ordered, baru kemudian dikoreksi dengan proses agglomorative, sehingga hasil pengelompokan menjadi lebih akurat. Dalam menentukan kelompok utama digunakan ukuran MSS (Median Split Silhouette). MSS mengukur homogenitas hasil pengelompokan,

dimana hasil pengelompokan yang dipilih adalah yang meminimumkan MSS. Pada pengelompokan 136 barisan DNA Virus Ebola dari GeneBank. Proses

awalnya dilakukan pensejajaran global, dan dilanjutkan dengan perhitungan jarak genetik dengan menggunakan koreksi Jukes-Cantor. Pada penelitian ini didapat jarak genetik maksimum adalah 0.6153407 sedangkan jarak genetik minimum adalah 0. Selanjutnya matriks jarak genetik dapat dijadikan dasar untuk mengelompokkan barisan-barisan tersebut dengan menggunakan metode Hopach. Pada hasil pengelompokan Hopach-PAM, diperoleh kelompok utama sebanyak 10 kelompok dengan nilai MSS sebesar 0,8873843. Kelompok-kelompok virus ebola dapat diidentifikasikan berdasarkan subspesies dan tahun pertama kali mewabah.

Proses pensejajaran global dan pengelompokan Hopach-PAM menggunakan bantuan program open source R.

One goal in the study of gene expression (DNA/Protein) is finding biologically important subsets and clusters of genes. Clustering these genes can be achieved by hierarchical and partitioning methods. Both clustering methods can be combined, where partition and hierarchy phases can be executed alternately, this method is known as a Hopach method. The partitioning step can be done by the PAM, SOM, or K-Means clustering method. The partition process continued with the process of Ordered, then corrected with agglomorative process, so that the clustminering results become more accurate. The main clusters determine by using MSS
(Median Split Silhouette). MSS is used to measure homogeneity of the clustering result, in which the clustering is selected to minimize its MSS. The clustering procceses of 136 DNA sequences of Ebola virus, are started by performing a global alignment, and continued with the genetic distance calculations using
Jukes-Cantor correction. In this research we found the maximum genetic distance is 0.6153407, meanwhile the minimum genetic distance is 0. Furthermore, the genetic distance matrix can be used as a basis for clustering sequences in Hopach-PAM clustering method. Based on, the clustering results, we obtained 10 major clusters with MSS value of 0.8873843. Ebola virus clusters can be identified by subspecies and the first occoring year of their outbreak. We implemented the global alignment process and Hopach-PAM clustering algorithm using the open source program R.;One goal in the study of gene expression (DNA/Protein) is finding biologically important subsets and clusters of genes. Clustering these genes can be achieved by hierarchical and partitioning methods. Both clustering methods can be combined, where partition and hierarchy phases can be executed alternately, this method is known as a Hopach method. The partitioning step can be done by the PAM, SOM, K-Means clustering method. The partition process continued with the process
of Ordered, then corrected with agglomorative process, so that the clustmineringresults become more accurate. The main clusters determine by using MSS (Median Split Silhouette). MSS is used to measure homogeneity of the clustering result, in which the clustering is selected to minimize its MSS. The clustering procceses of 136 DNA sequences of Ebola virus, are started by performing a global alignment, and continued with the genetic distance calculations using Jukes-Cantor correction. In this research we found the maximum genetic distance is 0.6153407, meanwhile the minimum genetic distance is 0. Furthermore, the genetic distance matrix can be used as a basis for clustering sequences in Hopach-PAM clustering method. Based on, the clustering results, we obtained 10 major clusters with MSS value of 0.8873843. Ebola virus clusters can be identified by subspecies and the first occoring year of their outbreak. We implemented the global alignment process and Hopach-PAM clustering algorithm using the open
source program R., One goal in the study of gene expression (DNA/Protein) is finding biologically
important subsets and clusters of genes. Clustering these genes can be achieved by
hierarchical and partitioning methods. Both clustering methods can be combined,
where partition and hierarchy phases can be executed alternately, this method is
known as a Hopach method. The partitioning step can be done by the PAM, SOM,
or K-Means clustering method. The partition process continued with the process
of Ordered, then corrected with agglomorative process, so that the clustminering
results become more accurate. The main clusters determine by using MSS
(Median Split Silhouette). MSS is used to measure homogeneity of the clustering
result, in which the clustering is selected to minimize its MSS. The clustering
procceses of 136 DNA sequences of Ebola virus, are started by performing a
global alignment, and continued with the genetic distance calculations using
Jukes-Cantor correction. In this research we found the maximum genetic distance
is 0.6153407, meanwhile the minimum genetic distance is 0. Furthermore, the
genetic distance matrix can be used as a basis for clustering sequences in Hopach-
PAM clustering method. Based on, the clustering results, we obtained 10 major
clusters with MSS value of 0.8873843. Ebola virus clusters can be identified by
subspecies and the first occoring year of their outbreak. We implemented the
global alignment process and Hopach-PAM clustering algorithm using the open
source program R.]"

Depok: Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Indonesia, 2015

T43650

UI - Tesis Membership Universitas Indonesia Library

Saiful Bahri Musa

Document clustering by dynamic hierarchical algorithm based on fuzzy set type-ii from frequent item set

"One of ways to facilitate process of information retrieval is by performing clustering toward collection of the existing documents. The existing text documents are often unstructured. The forms are varied and their groupings are ambiguous. This cases cause difficulty on information retrieval process. More-over, every second new documents emerge and need to be clustered. Generally, static document clus-tering method performs clustering of document after whole documents are collected. However, per-forming re-clustering toward whole documents when new document arrives causes inefficient clus-tering process. In this paper, we proposed a new method for document clustering with dynamic hierar-chy algorithm based on fuzzy set type-II from frequent item set. To achieve the goals, there are three main phases, namely: determination of keyterm, the extraction of candidates clusters and cluster hierar-chical construction. Based on the experiment, it resulted the value of F-measure 0.40 for Newsgroup, 0.62 for Classic and 0.38 for Reuters. Meanwhile, time of computation when addition of new document is lower than to the previous static method. The result shows that this method is suitable to produce so-lution of clustering with hierarchy in dynamical environment effectively and efficiently. This method also gives accurate clustering result.

Salah satu cara untuk mempermudah proses information retieval adalah dengan melakukan peng-klasteran terhadap koleksi dokumen yang ada. Dokumen teks yang ada seringkali tidak terstruktur, formatnya bervariasi, dan pengelompokannya ambigu. Hal ini menimbulkan kesulitan dalam proses information retrieval. Selain itu, setiap detik dokumen baru bartambah dan perlu untuk dikelompokkan. Pada umumnya, metode pengklasteran dokumen statis melakukan pengklasteran dokumen setelah kese-luruhan dokumen terkumpul. Namun, melakukan pengklasteran ulang terhadap keseluruhan dokumen ketika dokumen baru tiba mengakibatkan proses pengklasteran menjadi tidak efisien. Penelitian ini mengusulkan metode baru untuk pengklasteran dokumen dengan algoritma hierarki dinamis berbasis fuzzy set type-II dari frequent itemset. Untuk mencapai tujuan tersebut, terdapat 3 tahapan utama yang akan dilakukan, yaitu; ekstraksi keyterm, ekstraksi kandidat klaster dan pembangunan hirarki klaster. Berdasarkan eksperimen yang telah dilakukan diperoleh nilai F-Measure 0,40 untuk Newsgroup, 0,62 untuk Classic, dan 0,38 untuk Reuters. Sedangkan waktu komputasi pada saat penambahan dokumen dapat direduksi dibanding dengan metode statis sebelumnya. Hasil percobaan terhadap beberapa dataset koleksi dokumen menunjukkan bahwa metode ini tidak hanya sesuai untuk menghasilkan solusi peng-klasteran secara hirarki dalam lingkungan yang dinamis secara efektif dan efisien, tetapi juga membe-rikan hasil pengklasteran yang akurat."

Surabaya: Institut Teknologi Sepuluh Nopember, Faculty of Information Technology, Department of Informatics Engineering, 2016

AJ-Pdf

Artikel Jurnal Universitas Indonesia Library

<< 1 2 3 4 5 6 7 8 9 10 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian