ASLIB London: Machine Translation at the European Commission
The Association for Information Management: Translating and the Computer Conference – 17 & 18 November 2011
ASLIB London: Machine Translation at the European Commission
Spyridon Pilos, European Commission
Overview:
– EU official languages: 23
– EC procedural languages: EN, FR, DE
– DGT : 1750 linguists and 600 support staff
Past use of MT@EC:
– Rule-based MT (Systran)
– Developed between 1975 and 1998
– 28 languages available
– Since 2006 only linguistic maintenance work
– Suspended in 12/2010
Future use of MT@EC:
– Data-driven system
– Users and services <> dispatcher <> MT engines <> MT data and language resources
– Adopted in June 2010 by DGT
– Work along three lines:
-Data
-Engines
-Service
– MT data: started with internal DGT translation memories
– MT engines: set up MT engines and develop the necessary knowhow and processes; started with open source tools (SMT system Moses); also looking at Apertium
– Built 52 engines (including Irish)
– Limited access to engines since March 2011
– Maturity check in April-May 2011
– 64 translators from 22 language departments contributed more than 17K yes/no judgments of the usefulness of segment translation; 10 language departments opted for MT in the real-life trial
– Real-life trial since July 2011: automatic MT for EN>BG, DA, ES, FR, IT, NL, PL, PT, RO, SV
– Manual MT possible for all DGT staff for all 52 engines
– Over 850K pages translated
– Next: improve SMT language pairs to use as benchmarks
– MT Service: started with proof of concept (10/2010 – 04/2011)
– June 2012: start construction
– July 2013: operational baseline MT@EC service
– By the end of 2011: DGT – Translation Memory (“Acquis”) on JRC website; currently more than 2 million source segments and almost 17 million target segments