Приказ основних података о документу

dc.creatorPajić, Vesna
dc.creatorVujicić-Stanković, Stasa
dc.creatorStanković, Ranka
dc.creatorPajić, Miloš
dc.date.accessioned2020-12-17T22:21:28Z
dc.date.available2020-12-17T22:21:28Z
dc.date.issued2018
dc.identifier.issn0264-0473
dc.identifier.urihttp://aspace.agrif.bg.ac.rs/handle/123456789/4786
dc.description.abstractPurpose A hybrid approach is presented, which combines linguistic and statistical information to semi-automatically extract multiword term candidates from texts. Design/methodology/approach The method is designed to be domain and language independent, focusing on languages with rich morphology. Here, it is used for extracting multiword terms from texts in Serbian, belonging to the agricultural engineering domain, as a use case. Predefined syntactic structures were used for multiword terms. For each structure, a finite state transducer was developed, which recognizes text sequences having that structure and outputs the sequence in a normalized form, so that different inflectional forms of the same multiword term can be counted properly. Term candidates were further filtered by their frequencies and evaluated by two domain experts. Findings By using language resources, such as electronic dictionaries and grammars, 928 multiword terms were extracted out of 1,523 multiword terms that were recognized as candidates from a corpus having 42,260 different simple word forms; 870 of these were new, not already contained in the existing electronic dictionary of compounds for Serbian, and they were used to enrich the dictionary. Originality/value The paper presents methodology that can significantly contribute to the development of terminology lexicons in different areas. In this particular use case, some important agricultural engineering concepts were extracted from the text, but this approach could be used for other domains and languages as well.en
dc.publisherEmerald Group Publishing Ltd, Bingley
dc.relationinfo:eu-repo/grantAgreement/MESTD/Basic Research (BR or ON)/178006/RS//
dc.relationinfo:eu-repo/grantAgreement/MESTD/Integrated and Interdisciplinary Research (IIR or III)/47003/RS//
dc.rightsrestrictedAccess
dc.sourceElectronic Library
dc.subjectDigital documentsen
dc.subjectData analysisen
dc.subjectEvaluationen
dc.subjectInformation retrievalen
dc.subjectData processingen
dc.subjectForeign languagesen
dc.subjectData retrievalen
dc.subjectDocument handlingen
dc.titleSemi-automatic extraction of multiword terms from domain-specific corporaen
dc.typearticle
dc.rights.licenseARR
dc.citation.epage567
dc.citation.issue3
dc.citation.other36(3): 550-567
dc.citation.rankM23
dc.citation.spage550
dc.citation.volume36
dc.identifier.doi10.1108/EL-06-2017-0128
dc.identifier.scopus2-s2.0-85047317443
dc.identifier.wos000434773400011
dc.type.versionpublishedVersion


Документи

Thumbnail

Овај документ се појављује у следећим колекцијама

Приказ основних података о документу