View on GitHub

Indra

Indra is a Web Service which allows easy access to different distributional semantics models in several languages.

Build Status Chat

Table of Contents

What is Indra?

The creation of real-world Artificial Intelligence (AI) applications is dependent on leveraging a large volume of commonsense knowledge. Simple semantic interpretation tasks such as understanding that if ‘A is married to B’ then ‘A is the spouse of B’ or that ‘car, vehicle, auto’ have very similar meanings are examples of semantic approximation operations/inferences that are present in practically all applications of AI that interpret natural language.

Many AI applications depend on being semantically flexible, i.e. coping with the large vocabulary variation that is permited by natural language. Sentiment Analysis, Question Answering, Information Extraction, Semantic Search and Classification tasks are examples of tasks in which the ability to do semantic approximation is a central requirement.

Distributional Semantics Models and Word Vector models emerged as successful approaches for supporting semantic approximations due to their ability to build comprehensive semantic approximation models and also to their simplicity of representation.

Indra is a distributional semantics engine which facilitates the deployment of robust distributional semantic models for industry-level applications.

Features

Supported Models

Word Embeddings

This is the payload consumed by Indra to serve Word Embeddings of words or phrases.

Request data model (POST /vectors)

{
	"corpus": "wiki-2014",
	"model": "W2V",
	"language": "EN",
	"terms": ["love", "mother"]
}

Field corpus

The name of the corpus used to build the models:

Field model

The distributional model:

Field language

Two-letter-code ISO 639-1:

Response model

This is the response for the request above.

{
  "corpus": "wiki-2014",
  "model": "W2V",
  "language": "EN",
  "terms":
    {
      "love" : [0.333, 0.21, 0.3532],
      "mother" : [0.6356, 0.756, 0.9867]
    }
}

In the case that the model provides sparse vectors, terms attribute will be defined as follows:

{
  "love" : { "0" : 0.333, "1" : 0.21, "2" : 0.3532 },
  "mother" : { "0" : 0.6356, "1" : 0.756, "2" : 0.9867 }
}

Currently, only the ESA model is sparse.

Semantic Similarity

This is the payload consumed by Indra to compute Semantic Similarity between words or phrase pairs.

Request data model (POST /relatedness)

{
	"corpus": "wiki-2014",
	"model": "W2V",
	"language": "EN",
	"scoreFunction": "COSINE",
	"pairs": [{
		"t2": "love",
		"t1": "mother"
	},
	{
		"t2": "love",
		"t1": "father"
	}]
}

Fields corpus, model and language has the same definition previously shown.

Field scoreFunction

The function to compute the relatedness between the distributional vectors:

Response model

This is the response for the request above.

{
  "corpus": "wiki-2014",
  "model": "W2V",
  "language": "EN",
  "pairs": [
    {
      "t1": "mother",
      "t2": "love",
      "score": 0.45996829519139865
    },
    {
      "t1": "father",
      "t2": "love",
      "score": 0.32337835808129745
    }
  ],
  "scoreFunction": "COSINE"
}

One-to-many request data model (POST /relatedness/otm)

{
        "corpus": "wiki-2014",
        "model": "W2V",
        "language": "EN",
        "scoreFunction": "COSINE",
        "one" : "love",
	"many" : ["mother", "father", "child"]
}

One-to-many response model

This is the response for the request above.

{
  "corpus" : "wiki-2014",
  "model" : "W2V",
  "language" : "EN",
  "scoreFunction": "COSINE",
  "one" : "love",
  "many" : 
   {
      "mother" : 0.45996829519139865,
      "father": 0.32337835808129745,
      "child": 0.39881548413514684
   }
}

Translated Word Embeddings and Semantic Similarity

For translated word embeddings and translated semantic similarity just append “mt” : true in the JSON payload.

Usage

If you want to give a try on your own infrastructure take a look on Indra-Composed.

Public Endpoint

We have a public endpoint for demonstration only hence you can try right now with cURL on the command line.

For word embeddings:

curl -X POST -H "Content-Type: application/json" -d '{
	"corpus": "wiki-2014",
	"model": "W2V",
	"language": "EN",
	"terms": ["love", "mother"]
}' "http://indra.lambda3.org/vectors"

For semantic similarity:

curl -X POST -H "Content-Type: application/json" -d '{
	"corpus": "wiki-2014",
	"model": "W2V",
	"language": "EN",
	"scoreFunction": "COSINE",
	"pairs": [{
		"t2": "love",
		"t1": "mother"
	},
	{
		"t2": "love",
		"t1": "father"
	}]
}' "http://indra.lambda3.org/relatedness"

Citing Indra

Please cite Indra, if you use it in your experiments or project.

@Inbook{Freitas2016,
author="Freitas, Andr{\'e}
and Barzegar, Siamak
and Sales, Juliano Efson
and Handschuh, Siegfried
and Davis, Brian",
editor="Blomqvist, Eva
and Ciancarini, Paolo
and Poggi, Francesco
and Vitali, Fabio",
title="Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation",
bookTitle="Knowledge Engineering and Knowledge Management: 20th International Conference, EKAW 2016, Bologna, Italy, November 19-23, 2016, Proceedings",
year="2016",
publisher="Springer International Publishing",
address="Cham",
pages="212--222",
isbn="978-3-319-49004-5",
doi="10.1007/978-3-319-49004-5_14",
url="http://dx.doi.org/10.1007/978-3-319-49004-5_14"
}

Contributors (alphabetical order)