Databases and Natural Language Processing

Research conducted at the Databases and Natural Language Processing (in French, BdTln, for Bases de Données et Traitement de Langage Naturel) team covers a big variety of challenges concerning unstructured, semi-structured and structured data. The team is composed of 4 full professors, 11 associate professors (4 having an Habilitation à diriger des recherches) and 7 PhD students.

The team is organized into 3 subgroups, each having their own research focus:
  • Interactive Data Exploration and Analytics: this group addresses the development of data mining and OLAP techniques that allow users to interactively explore and analyse their data. It has a long-standing expertise in the fields of pattern mining (for example, to extract contextual preference rules or outliers using sampling techniques), query personalization and recommendation in databases developing user-centric OLAP (for example, focusing on the use of former queries logged by an OLAP server to enhance subsequent analyses). In a medium-to-long-term perspective, it also aims at the development of declarative languages dedicated to the interactive exploration of data.
  • Natural Language Processing and Human-Systems Interactions: this group has a long-standing expertise in the creation of mono- and multilingual resources and tools dedicated to named entities (NEs) and multiword expressions (MWEs), as well as referential units (coreference, temporal reference). For example, it has developed rule-based and probabilistic methods for named entity recognition, a multilingual database of proper names and their morpho-syntactic variants and relations in French, Polish and English (Prolexbase), the biggest French manually annotated coreference corpus (ANCOR) and Multiflex, a formalism and a tool for lexical description of contiguous MWEs. Moreover, the group has an extensive experience in assistive technologies. Namely, it has developed the Sibylle word prediction system for alternative and augmentative communication, which has undergone a large-scale end-user validation and is now daily used in therapeutics. All these resources and tools are distributed under open licenses.
  • Intelligent Data and Services: at its beginning, this group has developed a strong expertise in the field of tree languages, tree automata and their applications on XML and semi structured databases. Now, the research interests of this group deal with the use of Semantic Web technologies (with description logics) to smartly extract, query, or integrate data from distributed and heterogeneous data sources (using ontology based mediation), and to classify and compose Web services. It is applying its contributions in the fields of digital humanities and spatio-temporal databases.

The BdTln team is participating in 2 international, 3 national, and 5 regional projects. It is now coordinating the IC1207 COST action PARSEME and is strongly involved in the Erasmus Mundus Master's Programme Information Technologies for Business Intelligence (IT4BI).