University of Belgrade - Faculty of Agriculture
AgroSpace - Faculty of Agriculture Repository
    • English
    • Српски
    • Српски (Serbia)
  • English 
    • English
    • Serbian (Cyrillic)
    • Serbian (Latin)
  • Login
View Item 
  •   AgroSpace
  • Poljoprivredni fakultet
  • Radovi istraživača / Researchers’ publications
  • View Item
  •   AgroSpace
  • Poljoprivredni fakultet
  • Radovi istraživača / Researchers’ publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Semi-automatic extraction of multiword terms from domain-specific corpora

Authorized Users Only
2018
Authors
Pajić, Vesna
Vujicić-Stanković, Stasa
Stanković, Ranka
Pajić, Miloš
Article (Published version)
Metadata
Show full item record
Abstract
Purpose A hybrid approach is presented, which combines linguistic and statistical information to semi-automatically extract multiword term candidates from texts. Design/methodology/approach The method is designed to be domain and language independent, focusing on languages with rich morphology. Here, it is used for extracting multiword terms from texts in Serbian, belonging to the agricultural engineering domain, as a use case. Predefined syntactic structures were used for multiword terms. For each structure, a finite state transducer was developed, which recognizes text sequences having that structure and outputs the sequence in a normalized form, so that different inflectional forms of the same multiword term can be counted properly. Term candidates were further filtered by their frequencies and evaluated by two domain experts. Findings By using language resources, such as electronic dictionaries and grammars, 928 multiword terms were extracted out of 1,523 multiword terms that were ...recognized as candidates from a corpus having 42,260 different simple word forms; 870 of these were new, not already contained in the existing electronic dictionary of compounds for Serbian, and they were used to enrich the dictionary. Originality/value The paper presents methodology that can significantly contribute to the development of terminology lexicons in different areas. In this particular use case, some important agricultural engineering concepts were extracted from the text, but this approach could be used for other domains and languages as well.

Keywords:
Digital documents / Data analysis / Evaluation / Information retrieval / Data processing / Foreign languages / Data retrieval / Document handling
Source:
Electronic Library, 2018, 36, 3, 550-567
Publisher:
  • Emerald Group Publishing Ltd, Bingley
Funding / projects:
  • Serbian Language and Its Resources: Theory, Description and Applications (RS-178006)
  • Infrastructure for Technology Enhanced Learning in Serbia (RS-47003)

DOI: 10.1108/EL-06-2017-0128

ISSN: 0264-0473

WoS: 000434773400011

Scopus: 2-s2.0-85047317443
[ Google Scholar ]
URI
http://aspace.agrif.bg.ac.rs/handle/123456789/4786
Collections
  • Radovi istraživača / Researchers’ publications
Institution/Community
Poljoprivredni fakultet

DSpace software copyright © 2002-2015  DuraSpace
About the AgroSpace Repository | Send Feedback

re3dataOpenAIRERCUB
 

 

All of DSpaceCommunitiesAuthorsTitlesSubjectsThis institutionAuthorsTitlesSubjects

Statistics

View Usage Statistics

DSpace software copyright © 2002-2015  DuraSpace
About the AgroSpace Repository | Send Feedback

re3dataOpenAIRERCUB