Webservices

Input Tools


This collection of components is used to start OpeNER pipelines. For now a language identifier is available.

Language Identifier

Language identifier receives plain text and outputs the language of the input text. The identified language can be used as a parameter to the OpeNER modules that require a language parameter.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/language-identifier.

Basics


These components are the start of each OpeNER pipeline. Every further operation needs tokens, polarities and lemmas. For your convenience multiple POS taggers are available.

Tokenizer

The tokenizer receives plain text as input and a language parameter. It splits it in paragraphs, sentences and tokens and generates a KAF document with this information. The resulting KAF document can be used as input to other modules.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/tokenizer.

POS Tagger

Part of Speech Tagging means identifying whether each word is a noun, a verb, etc. This component provides Part of Speech Perceptron and Maximum Entropy Models for English, Spanish, French, Italian and Dutch. The models have been trained with the Apache OpenNLP Machine Learning API. The component also provides dictionary-based lemmatization which consists of identifying the lemma (dictionary entry) for a given word.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/pos-tagger.

Tree Tagger

This tool implements a wrapper for TreeTagger (http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger) allowing to apply this tagger to KAF files and obtain the result also in KAF format. It provides the lemmas and part-of-speech tags of the tokens in a given text and works for all the OpeNER languages. The tagset used is the one specified in the KAF definition, but the real tags from TreeTagger are also stored in the result.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/tree-tagger.

NER/NED/Co-reference


Combined this group of components delivers you state of the art Named Entity Recognition and Named Entity Disambiguation.

NER

Named Entity Recognition and Classification identifies names of persons, cities, museums, and classifies them in a semantic class (PERSON, LOCATION, etc.). This component uses the Apache OpenNLP API to provide Perceptron and Maximum Entropy models trained for Dutch, English, French, German, Italian and Spanish.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/ner.

NED

For a given Named Entity detected by the NER, the Named Entity Disambiguation component aims at identifying to which actual entity in a catalogue such name is referring to. This component is a client to query DBPedia Spotlight (http://spotlight.dbpedia.org).

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/ned.

Coreference

Coreference resolution aims at identifying every word which refers to the same object or entity. This component is loosely based on the Stanford Multi Sieve Coreference resolution system (https://http://www-nlp.stanford.edu/software/dcoref.shtml). The coreference resolution system uses all the linguistic information provided by the tokenizer, pos tagger, NER, NED and constituent parser.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/coreference.

Constituent Parser

Parsing means providing the syntactic tree representation of a sentence. This component provides shift-reduced style constituent parsers for English, French, Italian and Spanish trained using the Apache OpenNLP API. Constituent parsing is primarily used in OpeNER as an input to the Coreference resolution system.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/constituent-parser.

Polarity, Properties and Opinions


Combined these components deliver you top notch opinion detection on texts in general, as well as on detected properties. The basic detector is rule based, and the deluxe detector is machine learned.

Property Tagger

This module implements a tagger for hotel properties. It detects aspect words which represent aspect or properties of a hotel, and links them with the correct aspect class. These properties are important because they are usually the target of the opinions. For instance, it would detect that “bed” or “mattress” are hotel properties related to the aspect SLEEPING_COMFORT. It works for all the OpeNER languages and it is based on the KAF format for the input/output files.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/property-tagger.

Polarity Tagger

The polarity tagger assigns polarity information to words on the text. This polarity information contains basically the positive/negative content of the word (good, cheap, clean can be positive words in the hotel domain), and words which are polarity modifiers (negators, intensifiers, …). This tool is based on a set of lexicons which contain the polarity information, but new custom lexicons can be added. The polarity information will be used to detect and extract complete opinions.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/polarity-tagger.

Opinion Detector Basic

The opinion detector basic detects and extracts fine-grained opinions on an input KAF file. In particular the opinion expression (the actual opinion), the target (what is the opinion about) and the holder (who is expressing the opinion) will be detected for each opinion. This basic version is based on the evidence provided by the polarity information and a set of rules.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/opinion-detector-basic.

Opinion Detector Deluxe

The opinion detector deluxe also extracts fine-grained opinions as the basic version. It is based on Machine Learning, using two Artificial Intelligence algorithms (Conditional Random Fields and Support Vector Machines) to induce models from annotated data. Models for hotel, restaurants and attractions reviews are provided, as well as for political news, but the tool allows to train a new model for a new domain if the annotated data for the new domain is available.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/opinion-detector.

Storage


These components can store results of opener pipelines into the database or on amazon s3. You can use them as webservices, or use them as examples on how to build your own storage solutions.

Outlet

The outlet is a component that stores its inputs into a MySQL database. Given the right unique id it also retrieves the files from the database.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/outlet.

S3 Outlet

The S3 outlet is a modified version of the normal outlet. It uses Amazon S3 to store the input. When using the S3 outlet you can specify specific buckets and directories.

Converters


KAF is a nice format, but there are others. JSON for example is well known, and often hooks into nice javascript visualisations. NAF is more like KAF evolved, used in several other NLP projects.

kaf2json

XML is not always the best format to display results. That’s why there is a kaf-to-json parser that transforms any KAF document to a rougly equivalent JSON object.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/kaf2json.

kaf-naf parser

The KAF-NAF parser can translate KAF to NAF, and NAF to KAF. KAF is the format used by OpeNER. NAF is a slightly different format used by for example the NewsReader project.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/kaf-naf-parser.

Processors / Aggregators


These components aggregate, change, calculate or change the KAF document in any way. Where most other components have a KAF-in KAF-out policy the processors will output completely different formats.

Sentiment Scores

This component aggregates all sentiments found in a document and calculates an overall sentiment score as well as sentiment scores for the individual properties detected. The resulting score ranges from -1 to 1.

More information about the webservice can be found at its endpoint. The endpoint for this webservice is: http://opener.olery.com/scorer.

This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 261712.