Hasil Pencarian

Ditemukan 18400 dokumen yang sesuai dengan query

McEnery, Tony, 1964-

Corpus linguistics : method, theory and practice / Tony McEnery, Andrew Hardie

""Corpus linguistics is the study of language data on a large scale - the computer-aided analysis of very extensive collections of transcribed utterances or written texts. This textbook outlines the basic methods of corpus linguistics, explains how the discipline of corpus linguistics developed, and surveys the major approaches to the use of corpus data. It uses a broad range of examples to show how corpus data has led to methodological and theoretical innovation in linguistics in general. Clear and detailed explanations lay out the key issues of method and theory in contemporary corpus linguistics. A structured and coherent narrative links the historical development of the field to current topics in 'mainstream' linguistics. Practical activities and questions for discussion at the end of each chapter encourage students to test their understanding of what they have read and an extensive glossary provides easy access to definitions of technical terms used in the text"--"

Cambridge, UK: Cambridge University Press, 2013

410.188 MCE c (1);410.188 MCE c (2);410.188 MCE c (2)

Buku Teks SO Universitas Indonesia Library

McEnery, Tony, 1964-

Corpus linguistics : method, theory and practice

"Corpus linguistics is the study of language data on a large scale - the computer-aided analysis of very extensive collections of transcribed utterances or written texts. This textbook outlines the basic methods of corpus linguistics, explains how the discipline of corpus linguistics developed, and surveys the major approaches to the use of corpus data. It uses a broad range of examples to show how corpus data has led to methodological and theoretical innovation in linguistics in general. Clear and detailed explanations lay out the key issues of method and theory in contemporary corpus linguistics. A structured and coherent narrative links the historical development of the field to current topics in ?mainstream? linguistics. Practical activities and questions for discussion at the end of each chapter encourage students to test their understanding of what they have read and an extensive glossary provides easy access to definitions of technical terms used in the text."

Cambridge, UK: Cambridge University Press, 2012

410.188 MCE c (1);410.188 MCE c (2)

Buku Teks SO Universitas Indonesia Library

Adhitya Alkautsar

Penggunaan dan perubahan ranah sumber pada metafora sepakbola dalam artikel piala dunia 1966-2014 di Kompas: sebuah kajian metafora konseptual berbasis korpus = The use and change of source domain on soccer metaphors in Kompas world cup 1966 2014 articles: a corpus based study of conceptual metaphors / Adhitya Alkautsar

"ABSTRAK

Penelitian ini bertujuan menemukan ranah sumber, menjelaskan kekerapan penggunaan ranah sumber, mengungkapkan perubahan penggunaan dan pergeseran ranah sumber, serta menyelidiki kondisi sosial yang mendasari metafora-metafora yang digunakan Kompas untuk mendeskripsikan sepakbola dalam pemberitaan Piala Dunia 1966-2014. Dalam penelitian ini, Teori Metafora Konseptual Lakoff dan Johnson 1980 diterapkan melalui metodologi linguistik korpus. Antara periode 1966-2014, ditemukan total 60 ranah sumber yang digunakan, tetapi hanya delapan yang menunjukkan tingkat kekerapan tertinggi, yakni PEPERANGAN, POLITIK INTERNASIONAL, PERMAINAN, KESENIAN, PERIODE WAKTU, SEJARAH, GERAK ROTASI, dan pemerintahan yang berkaitan dengan hegemoni maskulinitas dalam masyarakat. Penggunaan tiap ranah bersifat fluktuatif, dan pergeseran ranah hanya fenomena minor.

ABSTRACT

This study aims to find the source domains of soccer metaphors used in Kompas rsquo coverage of World Cup 1966 2014, describe the frequency of its use, examine the phenomenon of domain change, and investigate the social background these metaphors are based from. Lakoff and Johnson rsquo s Conceptual Metaphor Theory 1980 are combined with corpus linguistics. Eight source domains from 60 domains found are the most frequently used, namely WAR, INTERNATIONAL POLITICS, GAMES, ART, TIME PERIOD, HISTORY, ROTATIONAL MOTION, and GOVERNMENT which relates to hegemonic masculinity in the society. All domains are fluctuative in use, and domain shift is considered a minor phenomenon."

2017

T48677

UI - Tesis Membership Universitas Indonesia Library

Corpus linguistics: an introduction

Edinburgh : Edinburgh University Press , 2001

410.2 COR

Buku Teks SO Universitas Indonesia Library

McEnery, Tony, 1964-

Corpus-based language studies: an advanced resource book

London: Routledge, 2010

410 MCE c

Buku Teks SO Universitas Indonesia Library

Mikhailov, Mikhail

Corpus Linguistics for Translation and Contrastive Studies : a guide for research

London : Routledge, 2017

418.020 285 MIK c

Buku Teks SO Universitas Indonesia Library

Weisser, Martin

Practical corpus linguistics : an introduction to corpus-based language analysis

Chichester: MA Wiley Blackwell, 2016

410.188 WEI p

Buku Teks SO Universitas Indonesia Library

Vachek, Josef

The Linguistic school of Prague; an introduction to its theory and practice

Bloomington: Indiana University Press, 1966

400 VAC l

Buku Teks SO Universitas Indonesia Library

Hunston, Susan

Corpora in applied linguistics

"This book explores corpus linguistics in language learning and research."

New York : Cambridge University Press,, 2006

418 HUN c

Buku Teks SO Universitas Indonesia Library

Ika Alfina

Development of Annotation Guidelines, Treebanks, and a Tree Rotation Method that Conform to Universal Dependencies v2 for Indonesian Dependency Parsing = Pengembangan Petunjuk Anotasi, Treebank dan Metode Rotasi Tree yang Mengacu ke Universal Dependencies v2 untuk Dependency Parsing Bahasa Indonesia

"Pada penelitian ini, kami ingin mengatasi masalah langkanya dataset untuk peneli- tian di bidang syntactic parsing untuk Bahasa Indonesia, terutama kurang tersedi- anya dependency treebank berbahasa Indonesia dalam kualitas yang baik. Adapun tujuan dari penelitian ada tiga: 1) mengusulkan petunjuk cara menganotasi depen- dency trebank untuk Bahasa Indonesia yang mengacu kepada aturan anotasi UD v2, 2) membangun dependency treebank yang dianotasi secara manual agar bisa berperan sebagai gold standard, 3) membangun sebuah dependency treebank de- ngan mengkonversi secara otomatis sebuah constituency treebank menjadi sebuah dependency treebank.

Kami sudah membuat panduan anotasi untuk membangun dependency treebank untuk Bahasa Indonesia yang mengacu kepada aturan UD v2. Pedoman tersebut mencakup aturan tokenisasi/segmentasi kata, pelabelan kelas kata (POS tagging), analisis fitur morfologi, dan anotasi hubungan dependency antar kata. Kami men- gusulkan bagaimana memproses klitika, kata ulang, dan singkatan pada tahap to- kenisasi/segmentasi kata. Pada tahapan penentuan kelas kata, kami mengusulkan pemetaan dari daftar kata dalam Bahasa Indonesia ke 17 kelas kata yang didefin- isikan oleh UD v2. Untuk anotasi fitur morfologi, kami telah memilih 14 dari 24 fitur morfologi UD v2 yang dinilai sesuai dengan aturan Bahasa Indonesia, berikut dengan 27 buah label feature-value yang bersesuaian dengan fitur morfologi terkait. Untuk anotasi hubungan dependency antarkata, kami mengusulkan penggunakan 14 buah label yang bersifat language-specific untuk menganotasi struktur sintaks yang khusus terdapat pada Bahasa Indonesia.

Sebuah dependency treebank berbahasa Indonesia yang bisa digunakan sebagai gold standard sudah berhasil dibangun. Treebank ini dibuat dengan merevisi se- cara manual sebuah dependency treebank yang sudah ada. Revisi dilakukan dalam dua fase. Pada fase pertama dilakukan koreksi terhadap tokenisasi/segmentasi kata, pelabelan kelas kata, dan anotasi terhadap hubungan dependency antarkata. Pada fase kedua, selain dilakukan sedikit koreksi untuk perbaikan pada tahap satu, di- tambahkan juga informasi kata dasar (lemma) dan fitur morfologi. Evaluasi ter- hadap kualitas treebank yang baru dilakukan dengan membangun model depen- dency parser menggunakan UDPipe. Hasil pengujian menunjukkan bahwa kami berhasil meningkatkan kualitas treebank, yang ditunjukkan dengan naiknya UAS sebanyak 9% dan LAS sebanyak 14%.

Terkait tujuan penelitian ketiga, kami juga sudah membangun sebuah treebank baru dengan mengkonversi secara otomatis sebuah constituency treebank ke dependency treebank. Pada proyek ini, kami mengusulkan sebuah metode rotasi tree yang bertu- juan mengubah dependency tree awal yang dihasilkan oleh alat NLP untuk Ba- hasa Inggris bernama Stanford UD converter sedemikan agar head-directionality dari frase kata benda yang dihasilkan sesuai dengan aturan Bahasa Indonesia yang umumnya bersifat head-initial. Kami menamakan algoritma yang dihasilkan seba- gai algoritma headSwap dan algoritma compound. Hasil percobaan menunjukkan bahwa metode rotasi tree yang diusulkan berhasil meningkatkan performa UAS se- banyak 32.5%.

In this dissertation, we address the lack of resources for Indonesian syntactic parsing research, especially the need for better quality Indonesian dependency treebanks. This work has three objectives: 1) to propose annotation guidelines for Indonesian dependency treebank that conform to UD v2 annotation guidelines, 2) to build a gold standard dependency treebank, 3) to build a silver standard dependency tree- bank by converting an existing Indonesian constituency treebank automatically to a dependency treebank.
We have proposed a set of annotation guidelines for Indonesian dependency tree- bank that conform to UD v2. The guidelines cover tokenization/word segmenta- tion, POS tagging, morphological features analysis, and dependency annotation. We proposed how to handle Indonesian clitics/multiword tokens, reduplication, and abbreviation for word segmentation. For POS tagging, we presented the mapping from UD v2 guidelines to the Indonesian lexicon. For morphological features, we proposed the use of 14 of 24 UD v2 morphological features along with 27 UD v2 feature-value tags for Indonesian grammar. Finally, we proposed using 14 language- specific relations to annotate the particular structures in Indonesian grammar for dependency annotation.
A gold standard Indonesian dependency treebank also has been built based on our proposed annotation guidelines. The gold standard was constructed by manually revised an existing Indonesian dependency treebank. The revision project consists of two phases. Major revision on word segmentation, POS tagging, and dependency relation annotation was conducted in the first phase. In the second phase, we added the lemma information and morphological features. Finally, we evaluated the qual- ity of the revised treebank by building a dependency parser using UDPipe. The experiment results show that we successfully improved the quality of the original treebank with a margin of 9% for UAS and 14% for LAS.
Finally, we built a silver standard treebank by automatically converting an Indone- sian constituency treebank to a dependency treebank. In this work, we proposed a method to improve the output of an English NLP tool named Stanford UD con- verter. We transformed the output so that it conforms to the head-directionality rule for noun phrases in Indonesian. We called the proposed tree rotation algorithm the headSwap method and the rule for noun phrases as the compound rule. The evaluation shows that our proposed method improved the UAS with a margin of 32.5%."

Depok: Fakultas Ilmu Komputer Universitas Indonesia, 2021

D-pdf

UI - Disertasi Membership Universitas Indonesia Library

<< 1 2 3 4 5 6 7 8 9 10 >>

Hasil Pencarian :: Simpan CSV :: Kembali

Hasil Pencarian