ASLIB London: Machine Translation at the European Commission

The Association for Information Management: Translating and the Computer Conference – 17 & 18 November 2011
aslib main logo

ASLIB London: Machine Translation at the European Commission

Spyridon Pilos, European Commission

Overview:

–          EU official languages: 23

–          EC procedural languages: EN, FR, DE

–          DGT : 1750 linguists and 600 support staff

Past use of MT@EC:

–          Rule-based MT (Systran)

–          Developed  between 1975 and 1998

–          28 languages available

–          Since 2006 only linguistic maintenance work

–          Suspended in 12/2010

Future use of MT@EC:

–          Data-driven system

–          Users and services <> dispatcher <> MT engines <> MT data and language resources

–          Adopted in June 2010 by DGT

–          Work along three lines:

                 -Data

                 -Engines

                 -Service

–          MT data: started with internal DGT translation memories

–          MT engines: set up MT engines and develop the necessary knowhow and processes; started with open source tools (SMT system Moses); also looking at Apertium

–          Built 52 engines (including Irish)

–          Limited access to engines since March 2011

–          Maturity check in April-May 2011

–          64 translators from 22 language departments contributed more than 17K yes/no judgments of the usefulness of segment translation; 10 language departments opted for MT in the real-life trial

–          Real-life trial since July 2011: automatic MT for EN>BG, DA, ES, FR, IT, NL, PL, PT, RO, SV

–          Manual MT possible for all DGT staff for all 52 engines

–          Over 850K pages translated

–          Next: improve SMT language pairs to use as benchmarks

–          MT Service: started with proof of concept (10/2010 – 04/2011)

–          June 2012: start construction

–          July 2013: operational baseline MT@EC service

–          By the end of 2011: DGT – Translation Memory (“Acquis”) on JRC website; currently more than 2 million source segments and almost 17 million target segments