Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach
Samo za registrovane korisnike
2011
Konferencijski prilog (Objavljena verzija)

Metapodaci
Prikaz svih podataka o dokumentuApstrakt
The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers. The method has two clearly distinguished phases. The first phase - pre-processing phase strongly relies upon the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting information. The transducers can be modified so that preferred efficiency is achieved and can be reused for extracting information from other pre-processed documents. We conclude that even untagged text can be treated as a semi-structured one, providing its structure can be successfully pre-processed. As a result, we extracted data from free form encyclopedia text and created a fully structured database with genotype and phenotype characteristics of the organisms.
Ključne reči:
information extraction / finite state transducer / semi-structured resource / linguistic resource / bioinformatics / genomeIzvor:
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and, 2011, 6807 LNCS, 282-289Izdavač:
- 16th International Conference on Implementation and Application of Automata, CIAA 2011
DOI: 10.1007/978-3-642-22256-6_26
ISSN: 0302-9743
WoS: 000304127100026
Scopus: 2-s2.0-79961198242
Institucija/grupa
Poljoprivredni fakultetTY - CONF AU - Pajić, Vesna AU - Pavlović-Lažetić, Gordana AU - Pajić, Miloš PY - 2011 UR - http://aspace.agrif.bg.ac.rs/handle/123456789/2415 AB - The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers. The method has two clearly distinguished phases. The first phase - pre-processing phase strongly relies upon the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting information. The transducers can be modified so that preferred efficiency is achieved and can be reused for extracting information from other pre-processed documents. We conclude that even untagged text can be treated as a semi-structured one, providing its structure can be successfully pre-processed. As a result, we extracted data from free form encyclopedia text and created a fully structured database with genotype and phenotype characteristics of the organisms. PB - 16th International Conference on Implementation and Application of Automata, CIAA 2011 C3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and T1 - Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach EP - 289 SP - 282 VL - 6807 LNCS DO - 10.1007/978-3-642-22256-6_26 ER -
@conference{ author = "Pajić, Vesna and Pavlović-Lažetić, Gordana and Pajić, Miloš", year = "2011", abstract = "The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers. The method has two clearly distinguished phases. The first phase - pre-processing phase strongly relies upon the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting information. The transducers can be modified so that preferred efficiency is achieved and can be reused for extracting information from other pre-processed documents. We conclude that even untagged text can be treated as a semi-structured one, providing its structure can be successfully pre-processed. As a result, we extracted data from free form encyclopedia text and created a fully structured database with genotype and phenotype characteristics of the organisms.", publisher = "16th International Conference on Implementation and Application of Automata, CIAA 2011", journal = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and", title = "Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach", pages = "289-282", volume = "6807 LNCS", doi = "10.1007/978-3-642-22256-6_26" }
Pajić, V., Pavlović-Lažetić, G.,& Pajić, M.. (2011). Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and 16th International Conference on Implementation and Application of Automata, CIAA 2011., 6807 LNCS, 282-289. https://doi.org/10.1007/978-3-642-22256-6_26
Pajić V, Pavlović-Lažetić G, Pajić M. Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and. 2011;6807 LNCS:282-289. doi:10.1007/978-3-642-22256-6_26 .
Pajić, Vesna, Pavlović-Lažetić, Gordana, Pajić, Miloš, "Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach" in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and, 6807 LNCS (2011):282-289, https://doi.org/10.1007/978-3-642-22256-6_26 . .