ELIA ND Athens Keynote: Machine Translation – The Patent Case
Machine Translation: The Patent Case
Paul Schwander, European Patent Office
About the European Patent Office (EPO)
- The EPO provides patent protection in up to 40 European countries, based on a single application in one of the three official languages (German, English, French)
- There are 38 member states with 27 languages
- Of about 7,000 employees; approx. 60% are patent examiners
Machine Translation
- “The Unitary Patent” is a European patent granted under the European Patent Convention; the EU intends to have the first unitary patent granted in 2013; heavy use of machine translation is expected (mostly for searching purposes and provide access to data to patent examiners)
- MT is not adequate for use in legal procedures
- Highly technical and legal language makes MT difficult (swimming pool = water-retaining recreational structure)
- EPO needs to develop a skill to “translate the machine translation”
Machine Translation use in the EPO
- Search in AB and MT full text > find relevant patents > order human translation
- MT for Asian languages: Chinese, Korean and Japanese
- Translation for language groups: 27 languages to EN, FR, DE
- 160 language pairs
- Translation quality: “final” versus “minimum”; the first for publication, the latter for search purposes
- MT engines used: Language Weaver and Google Translate; concerns about confidentiality with Google Translate tool
- Uses a low-cost LSP in China for human translation
- Out of 140,000 patents, only a few hundred need to be translated by humans
- Building specific translation memories; once the patents are published, they become a public domain and the translations can be aligned to create TM
- Corpora building: find document pairs > find documents > scan/OCR > xml st36 > aligned sentence (tmx) > dictionaries (xml)
- 4000 translation requests per day; mostly DE to EN, EN to FR, FR to EN, and EN to SP
- Quality level ranking:
- Accurate
- Fluent
- Actionable
- May be actionable
- Not useful
- Collects feedback on translation quality online
- Special corpora collection is needed in Turkish, Czech, Slovak, Bulgarian, Estonian, Romanian, Icelandic, Croatian, Slovenian, Latvian, Lithuanian, Albanian
Conclusions
- MT has positively changed the patent documentation landscape
- MT is a strategic priority for EPO: impossible to operate without it, but more work is needed to improve quality