ELIA ND Athens Keynote: Machine Translation – The Patent Case

ND Athens header 720x153

 

Machine Translation: The Patent Case

Paul Schwander, European Patent Office

 

About the European Patent Office (EPO)

  • The EPO provides patent protection in up to 40 European countries, based on a single application in one of the three official languages (German, English, French)
  • There are 38 member states with 27 languages
  • Of about 7,000 employees; approx. 60% are patent examiners

Machine Translation

  • “The Unitary Patent” is a European patent granted under the European Patent Convention; the EU intends to have the first unitary patent granted in 2013; heavy use of machine translation is expected (mostly for searching purposes and provide access to data to patent examiners)
  • MT is not adequate for use in legal procedures
  • Highly technical and legal language makes MT difficult (swimming pool =  water-retaining recreational structure)
  • EPO needs to develop a skill to “translate the machine translation”

Machine Translation use in the EPO

  • Search in AB and MT full text > find relevant patents > order human translation
  • MT for Asian languages: Chinese, Korean and Japanese
  • Translation for language groups: 27 languages to EN, FR, DE
  • 160 language pairs
  • Translation quality: “final” versus “minimum”; the first for publication, the latter for search purposes
  • MT engines used: Language Weaver and Google Translate; concerns about confidentiality with Google Translate tool
  • Uses a low-cost LSP in China for human translation
  • Out of 140,000 patents, only a few hundred need to be translated by humans
  • Building specific translation memories; once the patents are published, they become a public domain and the translations can be aligned to create TM
  • Corpora building: find document pairs > find documents > scan/OCR > xml st36 > aligned sentence (tmx) > dictionaries (xml)
  • 4000 translation requests per day; mostly DE to EN, EN to FR, FR to EN, and EN to SP
  • Quality level ranking:
  1. Accurate
  2. Fluent
  3. Actionable
  4. May be actionable
  5. Not useful
  • Collects feedback on translation quality online
  • Special corpora collection is needed in Turkish, Czech, Slovak, Bulgarian, Estonian, Romanian, Icelandic, Croatian, Slovenian, Latvian, Lithuanian, Albanian

Conclusions

  • MT has positively changed the patent documentation landscape
  • MT is a strategic priority for EPO: impossible to operate without it, but more work is needed to improve quality