Accéder directement au contenu Accéder directement à la navigation
Communication dans un congrès

Boosting Neural Machine Translation with Similar Translations

Jitao Xu 1 Josep-Maria Crego 2 Jean Senellart 2
1 TLP - Traitement du Langage Parlé
LIMSI - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur
Abstract : This paper explores data augmentation methods for training Neural Machine Translation to make use of similar translations, in a comparable way a human translator employs fuzzy matches. In particular, we show how we can simply feed the neural model with information on both source and target sides of the fuzzy matches, we also extend the similarity to include semantically related translations retrieved using distributed sentence representations. We show that translations based on fuzzy matching provide the model with "copy" information while translations based on embedding similarities tend to extend the translation "context". Results indicate that the effect from both similar sentences are adding up to further boost accuracy, are combining naturally with model fine-tuning and are providing dynamic adaptation for unseen translation pairs. Tests on multiple data sets and domains show consistent accuracy improvements. To foster research around these techniques, we also release an Open-Source toolkit with efficient and flexible fuzzy-match implementation.
Type de document :
Communication dans un congrès
Liste complète des métadonnées

Littérature citée [29 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-02956324
Contributeur : Limsi Publications <>
Soumis le : vendredi 2 octobre 2020 - 16:23:33
Dernière modification le : samedi 10 octobre 2020 - 03:26:01

Fichier

2020.acl-main.144.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

Collections

Citation

Jitao Xu, Josep-Maria Crego, Jean Senellart. Boosting Neural Machine Translation with Similar Translations. Annual Meeting of the Association for Computational Linguistics, Jul 2020, Seattle, United States. pp.1570-1579, ⟨10.18653/v1/2020.acl-main.143⟩. ⟨hal-02956324⟩

Partager

Métriques

Consultations de la notice

21

Téléchargements de fichiers

15