An European data market
As laid down in the European Strategy for data in 2020, data is the fuel of the 21st century.
This is the reason why the European Commission has launched the concept of Common European Data Spaces. Data Spaces will ensure that more data becomes available for use in the economy, society and research, while keeping the companies and individuals who generate the data in control. They will bring together relevant data infrastructures and governance frameworks to facilitate data pooling and sharing (from the Staff working document on data spaces).
One of the Data Spaces currently under development is the Language Data Space (LDS). Through it, relevant stakeholders, e.g., from the publishing, language technology or press industry, will be able to share and also monetise their language data and other language resources (e.g., language models) through a single platform, taking EU values and compliance with EU rules fully into account. As a result, the LDS will significantly increase the much-needed availability of clean, high-quality, compliant language data to support the development of state-of-the-art language technologies (LT) as well as the creation and deployment of large language models (LLMs) and other AI-based LT services for a range of businesses.
In particular, the creation of the LDS platform aims at marking a turning point in the approach to the collection of language resources: the LDS will help European industry to compete globally with the language technology services provided by US or Chinese companies, and to build trust throughout the language data sharing process.
The service contract between the European Commission and a consortium of four partners for the creation and implementation of the Common European Language Data Space – or LDS for short – began on 19 January 2023.
The service contract for the development of the LDS has a set duration of 36 months but may be renewed for a further 12 months.
The project tasks are structured around five main areas of activity: project coordination; governance, i.e., the set of rules for accessing and sharing LDS data; infrastructure, i.e., the LDS technical architecture; promotion campaigns; data protection and copyright compliance.
The LDS will be designed to implement the principle of data sovereignty, i.e., data holders will always retain control over their data. At the same time, it will provide an infrastructure compliant with EU rules and values, in which data can be further valorised through secure, trustworthy exchanges and monetisation.
All organisations dealing with language data, i.e., large to smaller private or public entities from industry – in different sectors, from advertising to media, from publishing to Language Technology – as well as research, public administration, cultural associations, libraries, archives, NGOs and citizens are welcome to participate in the LDS.
The Common European Language Data Space will be developed in line with the European Data Strategy and the DIGITAL Work Programme - i.e., focussing on the deployment of EU-compliant infrastructures and services aimed at strengthening the technological sovereignty, economic competitiveness and innovation potential of Europe's industry.