The OpeNER Project

OpeNER is a project funded by the European Commission under the FP7 (7th Framework Program). Its acronym stands for Open Polarity Enhanced Name Entity Recognition. It is a two year duration project which officially started at July 2012, and finishes at July 2014. In OpeNER are collaborating partners from Italy, Holland and Spain.

OpeNER’s main goal is to provide a set of ready to use tools to perform some natural language processing tasks, free and easy to adapt for SMEs to integrate them in their workflow. More precisely, OpeNER aims to be able to detect and disambiguate entity mentions and perform sentiment analysis and opinion detection on the texts, to be able for example, to extract the sentiment and the opinion of customers about certain resource (e.g. hotels and accommodations) in Web reviews.

Vision/Goals

Customer reviews and ratings on the Internet are increasing importance in the evaluation of products and services by potential customers. In certain sectors, it is even becoming a fundamental variable in the purchase decision. A recent Forrester study showed more than 30% of Internet users have evaluated products online, and that 70% of those studied end user generated reviews.

This trend will continue with the growth of Social Media and access to Information andCommunication Technologies (ICT). Consumers tend to trust the opinion of other consumers, especially those with prior experience of a product or service, rather than company marketing. The role of user comments is of particular importance when there is little differentiation between the product offers.

Sentiment Analysis and Opinion Mining are established, although nascent, fields of research, development and innovation. The goal is always broadly the same; to know “Who” is speaking about “What”, “When” and in “What sense”.

These factors have led to a burgeoning industry with a plethora of companies offering Sentiment Analysis services in Social Media. While most offer a generic service, in typically just one language, several companies have specialised offering services specific to tourism due to its bounded domain, demonstrable value, and the high level of adoption of Internet technologies by both suppliers and consumers.

It is also an application domain with limited scope and variation, and a high dependency on multilingual sentiment analysis and detection and classification of a wide range of common Named Entities.

Named Entity Recognition and Classification (NERC) are important in determining roles. Once multilingualism and cultural skew are introduced, the complexity of the challenge increases manifold. OpeNER will create base technologies for Cross-lingual NERC and Sentiment Analysis that will enable industry users to both implement and contribute to a basic set of core technologies that all require and allow them to focus their efforts on providing tailored and innovative solutions at the rules and analysis levels.

The OpeNER project will provide a rich Named Entity Data Source in a simple, structured and standardised format. The Named Entity Detection will be capable of marking Named Entities in the same format irrespective of the text under analysis or the language of the text. The project will also provide linking modules that are capable of matching locally detected Named Entities with generic data.

Objectives

OpeNER aims to provide enterprise and society with base technologies for Cross-lingual Named Entity Recognition and Classification and Sentiment Analysis through the reuse of existing resources and the open development of complementary technologies. The key objectives of the project are:

Repurposing of existing language resources and generation of a reference generic multilingual sentiment lexicon with cultural normalisation and scales. An extension lexicon for the tourism sector in different languages (Spanish, Dutch, German, Italian, English and French). Named Entity Recognition and Classification in the same set of target languages as the Sentiment Lexicon which is extensible to other languages by leveraging multilingual resources such as Wikipedia and Linked Data. Development and open availability of validated reference Sentiment and Opinion Mining techniques and tools based on the results of the project. Validation of the project results, principally in the tourism sector, with leading SMEs in the sector and with the support of several stakeholders as part of the End User Advisory Board. Research and trailing of models that will ensure that the project results are self-sustainable and economically viable in the long term. Achievement of the projects objectives by repurposing and leveraging existing state of the art and established language resources.

Consortium

The following members are part of the original OpeNER consortium.

	Project Coordinator - Vicomtech-IK4 is an applied research centre for Interactive Computer Graphics and Multimedia located in the Technology Park of San Sebastian (Spain). It is a non-profit foundation, founded in 2001 as a joint venture by the INI-GraphicsNet Foundation and the EiTB Broadcasting Group. It is currently formed by 15 members. Among its 95 employees, more than 75 are researchers dedicated to the field of applied research. The role of Vicomtech in the market is to supply society with technology by transfer of primary research to industry.
	Technical Coordinator - Founded in 2010, Olery is a privately held company based in Amsterdam, The Netherlands. Olery develops easy-to-use tools for the leisure and hospitality industry that provide insight into online reputation and social media presence by monitoring review sites (like Booking.com) as well as social media (like Twitter and Foursquare). Besides monitoring Olery also analyses this data. Olery doesn’t only weight the grades that guest has given a hotel, but also the content of the written accompanied text. This is a very important feature for hotels, because with Olery hotel managers can track changes in their service.
	The Faculty of Arts at the VU University Amsterdam is specialized in the development of complex semantic networks and ontologies and their application in the analysis of text. The VU has extensive experience in evaluation of effective communication. The faculty runs a number of projects on subjectivity analysis of text. We developed a sentiment lexicon for Dutch and we are extending this lexicon with FrameNet structures for detecting opinion holders.
	The IXA Research Group at the University of the Basque Country was created in 1988. It consists of 30 computer scientists and 14 linguists. The IXA Group was created with the aim of promoting the modernization of Basque by means of developing advanced computational resources and systems. IXA group also directs the al Master’s program: “Linguistic processing and analysis”. In IXA has coordinated the overall SemEval-2007 competition, and CLEF 2008 and 2009 Robust-WSD tasks and have lead a joint Stanford-UPV/EHU team in the 2009 TAC-KBP competition.
	Based in Pisa (Italy), SYNTHEMA is a high-technology company that was established in 1993 by computer scientists from the IBM Research Center. Since then, the company has rapidly evolved, becoming nowadays a leading provider of Language and Semantic solutions, with state-of-the-art technologies for applications like Enterprise Search, Knowledge Management, Sentiment Analysis, Audio&Text Mining, Technology Watch, Competitive Intelligence, Customer Relationship Intelligence and Management, Speech Recognition and Machine Translation.
	CNR, the Italian National Research Council, is the main public Italian research institution. The Pisa Research Area of CNR is one of the biggest among the CNR interdisciplinary research areas, which hosts ten research institutes among which the Institute for Informatics and Telematics (IIT) and the Institute of Linguistic Computation (ILC). IIT has followed the evolution of the World Wide Web from the success of markup languages, the explosion of the social web content to the introduction of a semantic layer.

End User Advisory Board