Ditemukan 2 dokumen yang sesuai dengan query
Budi Irmawati
Abstrak :
Large-scale annotated data written by second language learners are not always available for low-resource languages such as Indonesian. To cope with data scarcity, it is important to generate ‘learner-like’ artificial error sentences when the available real learner data is insufficient and language experts cannot construct data. In this paper, we propose a new method for generating effective error-injected artificial data to proliferate training examples for preposition error correction tasks. Our method first generates a large scale of noisy artificial error data via the use of a simple error injection method. It then selectively removes the uninformative (noisy) instances from the artificial data. We assume that ‘good’ artificial preposition error data would be effective training data for error correction tasks. Therefore, to evaluate the goodness of the generated artificial data, we used the generated artificial data as training data to correct preposition errors in real learners’ sentences. The results of our study indicate that the use of our artificial data for training improves preposition error correction performance. The results also show that training on a smaller sized of good instances outperforms training on much larger-sized noisy instances as well as that on sentences written by native speakers. This method is language-independent and easy to apply to other low-resource languages because it assumes only a small size of learner error data and uses features that could be extracted automatically from linguistically annotated sentences.
Depok: Faculty of Engineering, Universitas Indonesia, 2017
UI-IJTECH 8:3 (2017)
Artikel Jurnal Universitas Indonesia Library
Budi Irmawati
Abstrak :
In languages with fixed word orders, syntactic information is useful when solving natural language processing (NLP) problems. In languages like Indonesian, however, which has a relatively free word order, the usefulness of syntactic information has yet to be determined. In this study, a dependency annotation scheme for extracting syntactic features from a sentence is proposed. This annotation scheme adapts the Stanford typed dependency (SD) annotation scheme to cope with such phenomena in the Indonesian language as ellipses, clitics, and non-verb clauses. Later, this adapted annotation scheme is extended in response to the inability to avoid certain ambiguities in assigning heads and relations. The accuracy of these two annotation schemes are then compared, and the usefulness of the extended annotation scheme is assessed using the syntactic features extracted from dependency-annotated sentences in a preposition error correction task. The experimental results indicate that the extended annotation scheme improved the accuracy of a dependency parser, and the error correction task demonstrates that training data using syntactic features obtain better correction than training data that do not use such features, thus lending a positive answer to the research question.
Depok: Faculty of Engineering, Universitas Indonesia, 2017
UI-IJTECH 8:5 (2017)
Artikel Jurnal Universitas Indonesia Library