Serbian Language and Its Resources: Theory, Description and Applications

Link to this page

info:eu-repo/grantAgreement/MESTD/Basic Research (BR or ON)/178006/RS//

Serbian Language and Its Resources: Theory, Description and Applications (en)
Српски језик и његови ресурси: теорија, опис и примене (sr)
Srpski jezik i njegovi resursi: teorija, opis i primene (sr_RS)
Authors

Publications

Semi-automatic extraction of multiword terms from domain-specific corpora

Pajić, Vesna; Vujicić-Stanković, Stasa; Stanković, Ranka; Pajić, Miloš

(Emerald Group Publishing Ltd, Bingley, 2018)

TY  - JOUR
AU  - Pajić, Vesna
AU  - Vujicić-Stanković, Stasa
AU  - Stanković, Ranka
AU  - Pajić, Miloš
PY  - 2018
UR  - http://aspace.agrif.bg.ac.rs/handle/123456789/4786
AB  - Purpose A hybrid approach is presented, which combines linguistic and statistical information to semi-automatically extract multiword term candidates from texts. Design/methodology/approach The method is designed to be domain and language independent, focusing on languages with rich morphology. Here, it is used for extracting multiword terms from texts in Serbian, belonging to the agricultural engineering domain, as a use case. Predefined syntactic structures were used for multiword terms. For each structure, a finite state transducer was developed, which recognizes text sequences having that structure and outputs the sequence in a normalized form, so that different inflectional forms of the same multiword term can be counted properly. Term candidates were further filtered by their frequencies and evaluated by two domain experts. Findings By using language resources, such as electronic dictionaries and grammars, 928 multiword terms were extracted out of 1,523 multiword terms that were recognized as candidates from a corpus having 42,260 different simple word forms; 870 of these were new, not already contained in the existing electronic dictionary of compounds for Serbian, and they were used to enrich the dictionary. Originality/value The paper presents methodology that can significantly contribute to the development of terminology lexicons in different areas. In this particular use case, some important agricultural engineering concepts were extracted from the text, but this approach could be used for other domains and languages as well.
PB  - Emerald Group Publishing Ltd, Bingley
T2  - Electronic Library
T1  - Semi-automatic extraction of multiword terms from domain-specific corpora
EP  - 567
IS  - 3
SP  - 550
VL  - 36
DO  - 10.1108/EL-06-2017-0128
ER  - 
@article{
author = "Pajić, Vesna and Vujicić-Stanković, Stasa and Stanković, Ranka and Pajić, Miloš",
year = "2018",
abstract = "Purpose A hybrid approach is presented, which combines linguistic and statistical information to semi-automatically extract multiword term candidates from texts. Design/methodology/approach The method is designed to be domain and language independent, focusing on languages with rich morphology. Here, it is used for extracting multiword terms from texts in Serbian, belonging to the agricultural engineering domain, as a use case. Predefined syntactic structures were used for multiword terms. For each structure, a finite state transducer was developed, which recognizes text sequences having that structure and outputs the sequence in a normalized form, so that different inflectional forms of the same multiword term can be counted properly. Term candidates were further filtered by their frequencies and evaluated by two domain experts. Findings By using language resources, such as electronic dictionaries and grammars, 928 multiword terms were extracted out of 1,523 multiword terms that were recognized as candidates from a corpus having 42,260 different simple word forms; 870 of these were new, not already contained in the existing electronic dictionary of compounds for Serbian, and they were used to enrich the dictionary. Originality/value The paper presents methodology that can significantly contribute to the development of terminology lexicons in different areas. In this particular use case, some important agricultural engineering concepts were extracted from the text, but this approach could be used for other domains and languages as well.",
publisher = "Emerald Group Publishing Ltd, Bingley",
journal = "Electronic Library",
title = "Semi-automatic extraction of multiword terms from domain-specific corpora",
pages = "567-550",
number = "3",
volume = "36",
doi = "10.1108/EL-06-2017-0128"
}
Pajić, V., Vujicić-Stanković, S., Stanković, R.,& Pajić, M.. (2018). Semi-automatic extraction of multiword terms from domain-specific corpora. in Electronic Library
Emerald Group Publishing Ltd, Bingley., 36(3), 550-567.
https://doi.org/10.1108/EL-06-2017-0128
Pajić V, Vujicić-Stanković S, Stanković R, Pajić M. Semi-automatic extraction of multiword terms from domain-specific corpora. in Electronic Library. 2018;36(3):550-567.
doi:10.1108/EL-06-2017-0128 .
Pajić, Vesna, Vujicić-Stanković, Stasa, Stanković, Ranka, Pajić, Miloš, "Semi-automatic extraction of multiword terms from domain-specific corpora" in Electronic Library, 36, no. 3 (2018):550-567,
https://doi.org/10.1108/EL-06-2017-0128 . .
6
2
7

WebMonitoring software system: Finite state machines for monitoring the web

Pajić, Vesna; Vitas, Duško; Pavlović-Lažetić, Gordana; Pajić, Miloš

(ComSIS Consortium, 2013)

TY  - JOUR
AU  - Pajić, Vesna
AU  - Vitas, Duško
AU  - Pavlović-Lažetić, Gordana
AU  - Pajić, Miloš
PY  - 2013
UR  - http://aspace.agrif.bg.ac.rs/handle/123456789/3275
AB  - This paper presents a software system called WebMonitoring. The system is designed for solving certain problems in the process of information search on the web. The first problem is improving entering of queries at search engines and enabling more complex searches than keyword-based ones. The second problem is providing access to web page content that is inaccessible by common search engines due to search engine’s crawling limitations or time difference between the moment a web page is set up on the Internet and the moment the crawler finds it. The architecture of the WebMonitoring system relies upon finite state machines and the concept of monitoring the web. We present the system’s architecture and usage. Some modules were originally developed for the purpose of the WebMonitoring system, and some rely on UNITEX, linguistically oriented software system. We hereby evaluate the WebMonitoring system and give directions for further development.
PB  - ComSIS Consortium
T2  - Computer Science and Information Systems
T1  - WebMonitoring software system: Finite state machines for monitoring the web
EP  - 23
IS  - 1
SP  - 1
VL  - 10
DO  - 10.2298/CSIS110918036P
ER  - 
@article{
author = "Pajić, Vesna and Vitas, Duško and Pavlović-Lažetić, Gordana and Pajić, Miloš",
year = "2013",
abstract = "This paper presents a software system called WebMonitoring. The system is designed for solving certain problems in the process of information search on the web. The first problem is improving entering of queries at search engines and enabling more complex searches than keyword-based ones. The second problem is providing access to web page content that is inaccessible by common search engines due to search engine’s crawling limitations or time difference between the moment a web page is set up on the Internet and the moment the crawler finds it. The architecture of the WebMonitoring system relies upon finite state machines and the concept of monitoring the web. We present the system’s architecture and usage. Some modules were originally developed for the purpose of the WebMonitoring system, and some rely on UNITEX, linguistically oriented software system. We hereby evaluate the WebMonitoring system and give directions for further development.",
publisher = "ComSIS Consortium",
journal = "Computer Science and Information Systems",
title = "WebMonitoring software system: Finite state machines for monitoring the web",
pages = "23-1",
number = "1",
volume = "10",
doi = "10.2298/CSIS110918036P"
}
Pajić, V., Vitas, D., Pavlović-Lažetić, G.,& Pajić, M.. (2013). WebMonitoring software system: Finite state machines for monitoring the web. in Computer Science and Information Systems
ComSIS Consortium., 10(1), 1-23.
https://doi.org/10.2298/CSIS110918036P
Pajić V, Vitas D, Pavlović-Lažetić G, Pajić M. WebMonitoring software system: Finite state machines for monitoring the web. in Computer Science and Information Systems. 2013;10(1):1-23.
doi:10.2298/CSIS110918036P .
Pajić, Vesna, Vitas, Duško, Pavlović-Lažetić, Gordana, Pajić, Miloš, "WebMonitoring software system: Finite state machines for monitoring the web" in Computer Science and Information Systems, 10, no. 1 (2013):1-23,
https://doi.org/10.2298/CSIS110918036P . .
1
1