A European data market
As laid down in the European Strategy for data in 2020, data is the fuel of the 21st century.
This is the reason why the European Commission supports the development of Common European Data Spaces in strategic economic sectors and domains of public interest. They will bring together relevant data infrastructures and governance frameworks to facilitate data pooling and sharing. This will allow data from across the EU to be made available and exchanged in a trustworthy and secure manner, keeping companies and individuals in control of the data they generate.[1]
The Language Data Space
One of the Data Spaces currently under development is the Language Data Space (LDS). Through it, relevant stakeholders, e.g., from the publishing, language technology or press industry, will be able to exchange and monetise their language data and other language resources (e.g., language models) through a single platform, taking EU values and compliance with EU rules fully into account. As a result, the LDS will significantly increase the much-needed availability of clean, high-quality, compliant language data to support the development of state-of-the-art language technologies (LT) and AI-based LT services for a range of businesses.
The creation of the LDS platform aims at marking a turning point in the approach to the collection of language resources: the LDS will help European industry to compete globally with the language technology services provided by US or Chinese companies, and to build trust throughout the language data sharing process.
Factsheet
- Project start
The service contract between the European Commission and a consortium of four partners for the creation and implementation of the Common European Language Data Space – or LDS for short – began on 19 January 2023.
- Project duration
The service contract for the development of the LDS has a set duration of 36 months but may be renewed for a further 12 months.
- Planned tasks
The project tasks are structured around five main areas of activity: project coordination; governance, i.e., the set of rules for accessing and sharing LDS data; infrastructure, i.e., the LDS technical architecture; promotion campaigns; data protection and copyright compliance.
- Design principles
The LDS will be designed to implement the principle of data sovereignty, i.e., data holders will always retain control over their data. At the same time, it will provide an infrastructure compliant with EU rules and values, in which data can be further valorised through secure, trustworthy exchanges and monetisation.
- Potential stakeholders
All organisations dealing with language data, i.e., large to smaller private or public entities from industry – in different sectors, from advertising to media, from publishing to Language Technology – as well as research, public administration, cultural associations, libraries, archives, NGOs and citizens are welcome to participate in the LDS.
- Strategy
The Common European Language Data Space will be developed in line with the European Data Strategy and the DIGITAL Work Programme - i.e., focussing on the deployment of EU-compliant infrastructures and services aimed at strengthening the technological sovereignty, economic competitiveness and innovation potential of Europe's industry.