Cross-lingual Resources


We include several cross-lingual resources here, such as multilingual embeddings and parallel corpora.


This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), and Intelligence Advanced Research Projects Activity (IARPA) via contract FA8650-17-C-9116. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of DARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.

Multilingual Entity Linker for 282 Languages

Parallel Corpora

The parallel corpus files can be accessed here. Check the table below for the mapping between language pairs and filenames.


Language I Language II Filename
Amharic English am-en.json
Arabic English ar-en.json
Bangia English bn-en.json
Chinese English zh-en.json
Hausa English ha-en.json
Hungarian English hu-en.json
Persian English fa-en.json
Russian English ru-en.json
Somali English so-en.json
Spanish English es-en.json
Tamil English ta-en.json
Thai English th-en.json
Turkish English tr-en.json
Uyghur English ug-en.json
Urdu English ur-en.json
Uzbek English uz-en.json
Vietnamese English vi-en.json
Yoruba English yo-en.json


We provide aligned cross-lingual embeddings for over 200 langugages. The complete set of embeddings is accessible HERE. Please refer to THIS PAGE for the mapping between acronyms and languages.