Hasil Pencarian

Ditemukan 5 dokumen yang sesuai dengan query

M. Adhika Putra

Analisis kinerja sistem klasifikasi video berbasis tag menggunakan mapreduce untuk internet content profiling = An performance analysis of video classification system based on tag using mapreduce for internet content profiling

Abstrak :
Sebagian besar informasi yang beredar di internet merupakan konten video. Informasi video ini perlu dianalisis karena tidak semuanya yang beredar adalah video dengan konten yang baik. Banyak video dengan konten yang buruk beredar luas di internet dan dapat diakses oleh siapapun yang mengakses internet. Pada penelitian ini, dibuat sistem klasifikasi video pada Youtube dengan metode Symbolic Distance dan Focal Point menggunakan model pemrograman MapReduce pada Hadoop. Sistem klasifikasi ini mengidentifikasi tag yang tersemat pada setiap video di Youtube kemudian dibandingkan dengan matriks co-occurrence untuk mencari nilai symbolic distance pada sebuah video. Penggunaan metode Focal Point pada sistem klasifikasi bertujuan untuk meningkatkan akurasi dan focus untuk klasifikasi video. Dalam penelitian ini diukur juga kecepatan pemrosesan sistem klasifikasi dengan menggunakan Hadoop serta dicari faktor-faktor yang dapat mempengaruhi kecepatan pemrosesan. Untuk itu dilakukan 3 skenario pengujian berdasarkan ukuran InputSplit yang digunakan, jumlah node, serta konfigurasi pada YARN masing-masing dengan 3 ukuran file (500 MB, 1 GB, 1,5 GB) dengan masing-masing jumlah tag sebesar 58718, 119697, dan 160395 tag. Pada file berukuran 500 MB, 1 GB, 1,5 GB, penambahan jumlah node dapat mempercepat kecepatan rata-rata pemrosesan sebesar 0,2 detik, 5 detik, dan 16,3 detik. Kemudian dengan melakukan konfigurasi pada YARN, kecepatan pemrosesan dapat dipercepat hingga 47 detik, 277,1 detik, dan 354,3 detik pada file berukuran 500 MB, 1 GB, 1,5 GB. Dari pengujian juga diketahui semakin kecil InputSplit maka semakin tinggi kecepatan pemrosesan MapReduce. Namun jika mapper tidak dapat menangani jumlah split yang ada, maka kecepatan pemrosesan data akan menjadi lebih lambat dari sebelumnya. ......Most information that widely spread on the internet is video. This video information needs to be analyze because not all of the information have a good content. There are many video with bad content widely spread on the internet and anyone can access that video easily. In this research, Youtube Video Classification System with Symbolic Distance and Focal Point Method is made using a MapReduce from Hadoop framework. This system identifying the tag that assign in every Youtube video and then compare the tag with co-occurrence matrix to find the symbolic distance value for a single video. Focal Point in this system is useful to improve accuracy and focus of video classification. This research will measure the processing speed of this classification system and then search the factor that can affect processing speed. For that, three skenarios are implemented based on InputSplit size, amount of node, and YARN configuration with three file size (500 MB, 1 GB, 1,5 GB) with the number of each tag are 58718, 119697, and 160395 tag. For file with 500 MB, 1 GB, and 1,5 GB size, increasing the amount of node from two to three can speed up the process for 0,2 second, 5 second, and 16,3 second. Optimize the YARN configuration can speed up the process for 47 second, 277, 1 second, and 354,3 second for file with size of 500 MB, 1 GB, and 1,5 GB. This Reasearch also discover that if the size of InputSplit is small, then the speed of data processing is faster. But if the mapper can?t handle the amount of the split, it can make the processing speed slower than before.

Depok: Fakultas Teknik Universitas Indonesia, 2016

S63260

UI - Skripsi Membership Universitas Indonesia Library

Ramanti Dharayani

Pencarian Anomali dengan Menggunakan Algoritma BLAST Menggunakan framework MapReduce pada Platform Big Data = Genomic Anomaly Searching with BLAST Algorithm using MapReduce Framework in Big Data Platform

Abstrak :
Berkembangnya teknologi pada bioteknologi berdampak pada banyaknya jumlah data yang harus diolah untuk melakukan analisis dalam pencarian anomali terutama pada data Genomic. Selain pertumbuhan data kebutuhan akan kompleksitas dalam melakukan proses olah data akan mengikuti seiring berkembangnya data genomic. Dengan berkembangnya data dan kompleksitas olah data, Big Data merupakan salah satu solusi untuk mengatasi perkembangan data dan olah data yang cukup kompleks. Dalam penelitian ini membahas bagaimana Big Data dapat melakukan pencarian anomali pada data genomic dengan menggunakan MapReduce dan algoritma BLAST pada platform Big Data. Hasil dari penelitian ini adalah pencarian anomali pada data genomic dapat dilakukan dengan menggunakan mapreduce pada platform big data dengan waktu yang lebih efisien. ......Biofarma Corp should adopt big data on vaccine and serum development by analyze genomic sequencing using searching any anomaly. As the root problem, it the anomaly searching requires about 1.62 Terabytes data transient as primary data and 301 Gigabytes as secondary data to get analysis from genomic variance. Moreover Biofarma Corp spent 16 hours for one anomaly searching from 3 Terabytes vaccines. This study proposed big data implementation to handle anomaly searching processes by prioritize less time complexity and less spending storage. It was signalized by a research question, "How big data technology is applied in searching anomalies on genomic data". It aimed to implement big data system to facilitate large volume and complex data in order to fulfill business process on Biofarma Corp. It adopted framework architecture as brought by Demchenko, Ngo, and Membrey. This study has designed data flow from FASTA and FATQ as sources for anomalies searching processes. This data flow is facilitated in big data system as designed in this research. As main contribution, this research adopted MapReduce framework to run BLAST algorithm with less spending time. As comparison, MapReduce framework can handle 21, 33, and 55 K-Mer in four minutes respectively while 50 minutes were spent without MapReduce.

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2020

TA-Pdf

UI - Tugas Akhir Universitas Indonesia Library

Priagung Khusumanegara

Analisis performa kecepatan mapreduce pada hadoop menggunakan TCP packet flow analysis = Execution time performance analysis of hadoop mapreduce using TCP packet flow analysis

Abstrak :
Komputasi terdistribusi merupakan salah satu kemajuan teknologi dalam mengolah data. Penggunaan komputasi terdistribusi memudahkan user untuk mengolah data menggunakan beberapa komputer yang secara fisik terpisah atau terdistribusi. Salah satu teknologi yang menggunakan konsep komputasi terditribusi adalah Hadoop. Hadoop merupakan framework software berbasis Java dan open source yang berfungsi untuk mengolah data yang memiliki ukuran yang besar secara terdistribusi. Hadoop menggunakan sebuah framework untuk aplikasi dan programming yang disebut dengan MapReduce. Enam skenario diimplementasikan untuk menganalisa performa kecepatan MapReduce pada Hadoop. Berdasarkan hasil pengujian yang dilakukan diketahui penambahan jumlah physical machine dari satu menjadi dua physical machine dengan spesifikasi physical machine yang sesuai perancangan dapat mempercepat kecepatan rata-rata MapReduce. Pada ukuran file 512 MB, 1 GB, 1.5 GB, dan 2 GB, penambahan physical machine dapat mempercepat kecepatan rata-rata MapReduce pada masing-masing ukuran file sebesar 161.34, 328.00, 460.20, dan 525.80 detik. Sedangkan, penambahan jumlah virtual machine dari satu menjadi dua virtual machine dengan spesifikasi virtual machine yang sesuai perancangan dapat memperlambat kecepatan rata-rata MapReduce. Pada ukuran file 512 MB, 1 GB, 1.5 GB, dan 2 GB, penambahan virtual machine dapat memperlambat kecepatan rata-rata MapReduce pada masing-masing ukuran file sebesar 164.00, 504.34, 781.27, dan 1070.46 detik. Berdasarkan hasil pengukuran juga diketahui bahwa block size dan jumlah slot map pada Hadoop dapat mempengaruhi kecepatan MapReduce. ...... Distributed computing is one of the advance technology in data processing. The use of distributed computing allows users to process data using multiple computers that are separated or distributed physically. One of technology that uses the concept of distributed computing is Hadoop. Hadoop is a Java-based software framework and open source which is used to process the data that have a large size in a distributed manner. Hadoop uses a framework for application and programing which called MapReduce. Six scenarios are implemented to analyze the speed performance of Hadoop MapReduce. Based on the study, known that the additional the number of physical machines from one to two physical machines with suitable specifications design can speed up the average speed of MapReduce. On file 512 MB, 1 GB, 1.5 GB, and 2 GB size additional the number of physical machines can accelerate MapReduce average speed on each file size for 161.34, 328.00, 460.20, and 525.80 seconds. Meanwhile, additional the number of virtual machines from one to two virtual machines with suitable specifications design can slow down the average speed of MapReduce. On file 512 MB, 1 GB, 1.5 GB, and 2 GB size, additional the number of virtual machines can slow down the average speed of each MapReduce on a file size for 164.00, 504.34, 781.27, and 1070.46 seconds. Based on the measurement result is also known that the block size and number of slot maps in Hadoop MapReduce can affect speed.

Depok: Fakultas Teknik Universitas Indonesia, 2014

S55394

UI - Skripsi Membership Universitas Indonesia Library

Billy Surya Putra

Perancangan dan analisis sistem rekomendasi berdasarkan rating movielens 100k dengan menggunakan metode collaborative filtering dan mapreduce = Implementation and analysis of recommender system based on rating movielens 100k using collaborative filtering methods and mapreduce

Abstrak :
Sistem rekomendasi adalah sebuah teknik untuk menyediakan saran terkait suatu hal yang dapat dimanfaatkan oleh pengguna. Saran dapat berupa produk maupun jasa yang ditawarkan. Saran yang diberikan adalah produk atau jasa yang belum pernah digunakan atau dibeli oleh pengguna tersebut. Sistem rekomendasi, khususnya dengan menggunakan K-Nearest Neighbor KNN , mencapai kesuksesan pada beberapa akhir tahun ini. Penelitian ini akan diimplementasikan K-Nearest Neighbor pada komputasi terdistribusi yaitu MapReduce untuk merancang sistem rekomendasi dengan menggunakan Item Based Collaborative Filtering IBCF dan User Based Collaborative Filtering UBCF pada dataset Movielens 100k. Penelitian akan menggunakan beberapa komputasi penghitung kesamaan yaitu Cosine Based Similarity, Pearson Correlation Similarity dan Euclidean Distance. Hasil percobaan yang didapat adalah algoritma Euclidean Distance menghasilkan performa terbaik dalam waktu proses dan nilai keakuratan. Pada pendekatan IBCF, Euclidean Distance membutuhkan waktu proses dengan rata-rata 13 sekon dan nilai korelasi sebesar 0.84. Sedangkan pada UBCF, Euclidean Distance membutuhkan waktu proses dengan rata-rata 32 sekon dan nilai korelasi sebesar 0.84. ......Recommender system is a technique to provide suggestions related to a thing that can be used for user. Suggestions can be products and services offered. The advice given is a product or service that has never been used or purchase by the user. The recommendation system, especially by using K Nearest Neighbor KNN , achieving success in several year. This research will be implemented K Nearest Neighbor at distributed process that called MapReduce to arrange system by using Item Based Collaborative Filtering IBCF and User Based Collaborative Filtering UBCF on Movielens 100k dataset. The research will use several techniques to compute similarities such as Cosine Based Similarity, Pearson Correlation Similarity and Euclidean Distance. The result of the experiment is Euclidean Distance algorithm give the best performance in process time and correlation. In the IBCF approach, Euclidean Distance takes process around 13 seconds and correlation value is 0.84. And at UBCF, Euclidean Distance takes processing time around 32 seconds and correlation value is 0.84.

Depok: Fakultas Teknik Universitas Indonesia, 2017

S68737

UI - Skripsi Membership Universitas Indonesia Library

Steele, Brian

Algorithms for data science

Abstrak :
This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts: (a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter. (b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System. (c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials. This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.

Switzerland: Springer International Publishing, 2016

e20510037

eBooks Universitas Indonesia Library

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian