University of Belgrade - Faculty of Agriculture
AgroSpace - Faculty of Agriculture Repository
    • English
    • Српски
    • Српски (Serbia)
  • English 
    • English
    • Serbian (Cyrillic)
    • Serbian (Latin)
  • Login
View Item 
  •   AgroSpace
  • Poljoprivredni fakultet
  • Radovi istraživača / Researchers’ publications
  • View Item
  •   AgroSpace
  • Poljoprivredni fakultet
  • Radovi istraživača / Researchers’ publications
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach

Authorized Users Only
2011
Authors
Pajić, Vesna
Pavlović-Lažetić, Gordana
Pajić, Miloš
Conference object (Published version)
Metadata
Show full item record
Abstract
The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers. The method has two clearly distinguished phases. The first phase - pre-processing phase strongly relies upon the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting information. The transducers can be modified so that preferred efficiency is achieved and can be reused for extracting information from other pre-processed documents. We conclude that even untagged text can be treated as a semi-structured one, providing its structure can be successfully pre-processed. As a result, we extracted data from free form encyclopedia text and created a fully structured database with genotype and phenotype characteristics of the organisms.
Keywords:
information extraction / finite state transducer / semi-structured resource / linguistic resource / bioinformatics / genome
Source:
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and, 2011, 6807 LNCS, 282-289
Publisher:
  • 16th International Conference on Implementation and Application of Automata, CIAA 2011

DOI: 10.1007/978-3-642-22256-6_26

ISSN: 0302-9743

WoS: 000304127100026

Scopus: 2-s2.0-79961198242
[ Google Scholar ]
3
4
URI
http://aspace.agrif.bg.ac.rs/handle/123456789/2415
Collections
  • Radovi istraživača / Researchers’ publications
Institution/Community
Poljoprivredni fakultet
TY  - CONF
AU  - Pajić, Vesna
AU  - Pavlović-Lažetić, Gordana
AU  - Pajić, Miloš
PY  - 2011
UR  - http://aspace.agrif.bg.ac.rs/handle/123456789/2415
AB  - The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers. The method has two clearly distinguished phases. The first phase - pre-processing phase strongly relies upon the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting information. The transducers can be modified so that preferred efficiency is achieved and can be reused for extracting information from other pre-processed documents. We conclude that even untagged text can be treated as a semi-structured one, providing its structure can be successfully pre-processed. As a result, we extracted data from free form encyclopedia text and created a fully structured database with genotype and phenotype characteristics of the organisms.
PB  - 16th International Conference on Implementation and Application of Automata, CIAA 2011
C3  - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and
T1  - Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach
EP  - 289
SP  - 282
VL  - 6807 LNCS
DO  - 10.1007/978-3-642-22256-6_26
ER  - 
@conference{
author = "Pajić, Vesna and Pavlović-Lažetić, Gordana and Pajić, Miloš",
year = "2011",
abstract = "The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers. The method has two clearly distinguished phases. The first phase - pre-processing phase strongly relies upon the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting information. The transducers can be modified so that preferred efficiency is achieved and can be reused for extracting information from other pre-processed documents. We conclude that even untagged text can be treated as a semi-structured one, providing its structure can be successfully pre-processed. As a result, we extracted data from free form encyclopedia text and created a fully structured database with genotype and phenotype characteristics of the organisms.",
publisher = "16th International Conference on Implementation and Application of Automata, CIAA 2011",
journal = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and",
title = "Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach",
pages = "289-282",
volume = "6807 LNCS",
doi = "10.1007/978-3-642-22256-6_26"
}
Pajić, V., Pavlović-Lažetić, G.,& Pajić, M.. (2011). Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and
16th International Conference on Implementation and Application of Automata, CIAA 2011., 6807 LNCS, 282-289.
https://doi.org/10.1007/978-3-642-22256-6_26
Pajić V, Pavlović-Lažetić G, Pajić M. Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and. 2011;6807 LNCS:282-289.
doi:10.1007/978-3-642-22256-6_26 .
Pajić, Vesna, Pavlović-Lažetić, Gordana, Pajić, Miloš, "Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach" in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and, 6807 LNCS (2011):282-289,
https://doi.org/10.1007/978-3-642-22256-6_26 . .

DSpace software copyright © 2002-2015  DuraSpace
About the AgroSpace Repository | Send Feedback

re3dataOpenAIRERCUB
 

 

All of DSpaceCommunitiesAuthorsTitlesSubjectsThis institutionAuthorsTitlesSubjects

Statistics

View Usage Statistics

DSpace software copyright © 2002-2015  DuraSpace
About the AgroSpace Repository | Send Feedback

re3dataOpenAIRERCUB