We provide an in-depth analysis of the accuracy for each part-of-speech (POS). Furthermore, our fine-tuned models are trained on a larger dataset covering larger vocabulary and contexts. Although our approach performed at par with the baseline, we did observe some improvements for some POS tags in some experiments. Overall, the accuracy ranges between 78% to 84% for different data configurations. We measure the impact of augmentation using different data configurations to fine-tune BERT on target sense verification (TSV) task. Augmentation increased the dataset size to 352K pairs (149K positive and 203K negative pairs). This paper presents an enrichment to the ArabGlossBERT dataset, by augmenting it using (Arabic-English-Arabic) machine back-translation. The most common semantically-labeled dataset for Arabic is the ArabGlossBERT, a relatively small dataset that consists of 167K context-gloss pairs (about 60K positive and 107K negative pairs), collected from Arabic dictionaries. We conclude that Google Translate is a useful tool for comparative researchers when using bag-of-words text models.Īrabic language lacks semantic datasets and sense inventories. With regard to LDA topic models, we find topical prevalence and topical content to be highly similar with again only small differences across languages. What is more, we find considerable overlap in the set of features generated from human-translated and machine-translated texts. We first find TDMs for both text corpora to be highly similar, with minor differences across languages. We evaluate results at both the document and the corpus level. We use the europarl dataset and compare term-document matrices (TDMs) as well as topic model results from gold standard translated text and machine-translated text. But in doing so, do we get lost in translation? This paper evaluates the usefulness of machine translation for bag-of-words models-such as topic models. To address this issue, some analysts have suggested using Google Translate to convert all texts into English before starting the analysis (Lucas et al. Yet, comparative researchers are presented with a big challenge: across countries people speak different languages. Automated text analysis allows researchers to analyze large quantities of text.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |