View on GitHub

Graphene

Coreference Resolution, Simplification and Open Relation Extraction Pipeline

Graphene

Graphene: Knowledge Graph / Open Relation Extraction

Motivation

Graphene is an information extraction pipeline which extracts Knowledge Graphs from texts (n-ary relations and rhetorical structures extracted from complex factoid discourse). Given a sentence or a text, Graphene outputs a semantic representation of the text which is a labeled directed graph (a knowledge graph). This knowledge graph can be later used for addressing different AI tasks, such as building Question Answering systems, extracting structured data from text, supporting semantic inference, among other tasks. Differently from existing open relation extraction tools, which focus on the main relation expressed in a sentence, Graphene aims at maximizing the extraction of contextual relations. For example:

Trump withdrew his sponsorship after the second Tour de Trump in 1990 because his business ventures were experiencing financial woes.

Graphene-Extraction

In order to capture all the contextual information, Graphene performs the following steps:

Graphene’s extracted graphs are represented by our RDFNL format, an simple format that facilitates the representation of complex contextual relations in a way that balances machine representation with human legibility. A description of the RDFNL format can be found here. In order to increase further processability of the extracted relations, Graphene can materialize its relations into a proper RDF graph serialized under the N-Triples specification of the RDF standard. A description of the RDF format can be found here. Alternatively, developers can use the direct output class of the API, which is serializable and deserializable as a JSON object.

Example Extractions

Sentence Extraction

Although the Treasury will announce details of the November refunding on Monday, the funding will be delayed if Congress and President Bush fail to increase the Treasury's borrowing capacity.

The serialized class: JSON
The RDFNL format:

# Although the Treasury will announce details of the November refunding on Monday , the funding will be delayed if Congress and President Bush fail to increase the Treasury 's borrowing capacity .

bacf06771e0f4fc5a8e68c30fc77c9c4    0    the Treasury    will announce    details of the November refunding
    S:TEMPORAL    on Monday .
    L:CONTRAST    948eeebd73564adab7dee5c6f177b3b9

948eeebd73564adab7dee5c6f177b3b9    0    the funding    will be delayed        
    L:CONDITION 006a71e51295440fab7a8e8c697d2ba6
    L:CONDITION e4d86228cff443b7a8e9f6d8a5c5987b
    L:CONTRAST    bacf06771e0f4fc5a8e68c30fc77c9c4

006a71e51295440fab7a8e8c697d2ba6    1    Congress    fail    to increase the Treasury 's borrowing capacity
    L:LIST    e4d86228cff443b7a8e9f6d8a5c5987b

e4d86228cff443b7a8e9f6d8a5c5987b    1    president Bush    fail    to increase the Treasury 's borrowing capacity
    L:LIST    006a71e51295440fab7a8e8c697d2ba6

The RDF N-Triples format: NT

Full text extraction of the Barack Obama Wikipedia Page (2017-11-06):

The serialized class: JSON
The RDFNL format: RDFNL
The RDF N-Triples format: RDF

Contributors (alphabetical order)

Requirements

Setup

Compiling and packaging requires two additional packages:

Sentence Simplification

cd /tmp
wget https://github.com/Lambda-3/SentenceSimplification/archive/v5.0.0.tar.gz -O SentenceSimplification.tar.gz
tar xfa SentenceSimplification.tar.gz
cd SentenceSimplification
mvn -DskipTests install

Discourse Simplification

cd /tmp
wget https://github.com/Lambda-3/DiscourseSimplification/archive/v8.0.0.tar.gz -O DiscourseSimplification.tar.gz
tar xfa DiscourseSimplification.tar.gz
cd DiscourseSimplification
mvn -DskipTests install

More dependencies (requires docker)

Prior to running Graphene, two additional dependencies must be met:

Both are provided with the docker images:

Setup of Graphene

Graphene-Core is build with

mvn clean package -DskipTests

If you want the server part, you have to specify that profile:

mvn -P server clean package -DskipTests

If you want the command line part, you have to specify that profile:

mvn -P cli clean package -DskipTests

To build both interfaces, you can specify both profiles:

mvn -P cli -P server clean package -DskipTests

Docker-Compose

Create a new config file and adjust your settings:

touch conf/graphene.conf

Then, you can build and start the composed images:

docker-compose up

Usage

Graphene-Core

Graphene comes with a Java API which is described here. You must have a PyCobalt instance running, it is provided in the docker-compose-core.yml. Start it with docker-compose -f docker-compose-core.yml. You must then change the config file:

graphene {
	coreference.url = "http://localhost:5128/resolve"
}

Graphene-Sever

For simplified access, we wrapped the Graphene-Core library inside a REST-like web-service.

docker-compose up

The usage of the Graphene-Server is described here.

Graphene-CLI

Another way of accessing our service is provided by a command-line interface, which is described here. Like the Graphene-Core setup, you must have a PyCobalt instance running before.