Scientific knowledge is one of the greatest assets of humankind.
This knowledge is recorded and disseminated in scientific publications, and the body of scientific literature is growing at an enormous rate.
Automatic methods of processing and cataloguing that information are necessary for assisting scientists to navigate this vast amount of information, and for facilitating automated reasoning, discovery and decision making on that data.
Structured information can be extracted at different levels of granularity.
Previous and ongoing work has focused on bibliographic information (segmentation and linking of referenced literature, Wick et al., 2013), keyword extraction and categorization (e.g., what are tasks, materials and processes central to a publication, (Augenstein et al., 2017)), and cataloguing research findings. Scientific discoveries can often be represented as pairwise relationships, e.g., protein-protein (Mallory et al., 2016), drug-drug (Segura-Bedmar et al., 2013), and chemical-disease (Li et al., 2016) interactions, or as more complicated networks such as action graphs describing scientific procedures (e.g., synthesis recipes in material sciences, (Mysore et al., 2017)). Information extracted with such methods can be enriched with time-stamps, and other meta-information, such as indicators of uncertainty or limitations of the discovered facts (Zhou et al., 2015).
Structured representations, such as knowledge graphs, summarize information from a variety of sources in a convenient and machine readable format. Graph representations, that link the information of a large body of publications, can reveal patterns and lead to the discovery of new information that would not be apparent from the analysis of just one publication. This kind of aggregation can lead to new scientific insights (Kim et al., 2017), and it can also help to detect trends (Prabhakaran et al., 2016), or find experts for a particular scientific area (Neshati et al., 2014).
While various workshops have focused separately on several aspects -- extraction of information from scientific articles, building and using knowledge graphs, the analysis of bibliographical information, graph algorithms for text analysis -- the proposed workshop focuses on processing scientific articles and creating structured repositories such as knowledge graphs for finding new information and making scientific discoveries.
The aim of this workshop is to identify the necessary representations for facilitating automated reasoning over scientific information, and to bring together experts in natural language processing and information extraction with scientists from other domains (e.g. material sciences, biomedical research) who want to leverage the vast amount of information stored in scientific publications.
Call for Papers
We invite submissions on (but not limited to) the following topics:
Information extraction from scientific publications
identification of concepts in scientific articles (in various domains)
extraction of relations from scientific articles (in various domains) — including n-ary relations with n>2, “negative relations”
large scale information extraction, clustering and detection of trends in scientific fields
targeted information extraction for completing knowledge graphs
updating knowledge graphs (adding new information, removing erroneous facts, or possibly having explicit links for incorrect statements)
finding patterns and mining new information in knowledge graphs
automatic generation and ranking of scientific hypotheses
aggregation and extraction of human-understandable scientific rules and generalities
extraction of script-knowledge and scientific procedures
detection of (inferred or explicitly stated) causality
automated reasoning over repositories of extracted information
Isabelle Augenstein, Mrinal Das, Sebastian Riedel, Lakshmi Vikraman, and Andrew McCallum. 2017. SemEval 2017 task
10: ScienceIE - extracting keyphrases and relations from scientific publications. (SemEval-2017)
Edward Kim, Kevin Huang, Adam Saunders, Andrew McCallum, Gerbrand Ceder, and Elsa Olivetti. 2017. Materials synthe-
sis insights from scientific literature via text extraction and machine learning. Chemistry of Materials 29(21)
Jiao Li, Yueping Sun, Robin J Johnson, Daniela Sciaky, Chih-Hsuan Wei, Robert Leaman, Allan Peter Davis, Carolyn J
Mattingly, Thomas C Wiegers, and Zhiyong Lu. 2016. Biocreative VCDR task corpus: a resource for chemical dis-
ease relation extraction. Database : the journal of biological databases and curation.
Emily K. Mallory, Ce Zhang, Christopher Re, and Russ B. Altman. 2016. Large-scale extraction of gene interactions from
full-text literature using DeepDive. Bioinformatics 32(1):106–113.
Sheshera Mysore, Edward Kim, Emma Strubell, Ao Liu, Haw-Shiuan Chang, Srikrishna Kompella, Kevin Huang, Andrew
McCallum, and Elsa Olivetti. 2017. Automatically extracting action graphs from materials science synthesis procedures.
Mahmood Neshati, Djoerd Hiemstra, Ehsaneddin Asgari, and Hamid Beigy. 2014. Integration of scientific and social networks. World Wide Web 17(5)
Vinodkumar Prabhakaran, William L. Hamilton, Dan McFarland, and Dan Jurafsky. 2016. Predicting the rise and fall of
scientific topics from trends in their rhetorical framing. (ACL 2016)
Isabel Segura-Bedmar, Paloma Martınez, and Marıa Herrero Zazo. 2013. SemEval-2013 task 9 : Extraction of drug-drug inter-
actions from biomedical texts (DDIExtraction 2013). (*SEM)
Michael L Wick, Ari Kobren, and Andrew McCallum. 2013. Large-scale author coreference via hierarchical entity represen-
Huiwei Zhou, Huijie Deng, Degen Huang, and Minling Zhu. 2015. Hedge scope detection in biomedical texts: An effective
dependency-based method. PLoS One 10(7)