Cross-lingual Resources
About
We include several cross-lingual resources here, such as multilingual embeddings and parallel corpora.
Acknowledgement
This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), and Intelligence Advanced Research Projects Activity (IARPA) via contract FA8650-17-C-9116. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of DARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.
Multilingual Entity Linker for 282 Languages
Parallel Corpora
The parallel corpus files can be accessed here. Check the table below for the mapping between language pairs and filenames.
Filenames
Language I | Language II | Filename |
---|---|---|
Amharic | English | am-en.json |
Arabic | English | ar-en.json |
Bangia | English | bn-en.json |
Chinese | English | zh-en.json |
Hausa | English | ha-en.json |
Hungarian | English | hu-en.json |
Persian | English | fa-en.json |
Russian | English | ru-en.json |
Somali | English | so-en.json |
Spanish | English | es-en.json |
Tamil | English | ta-en.json |
Thai | English | th-en.json |
Turkish | English | tr-en.json |
Uyghur | English | ug-en.json |
Urdu | English | ur-en.json |
Uzbek | English | uz-en.json |
Vietnamese | English | vi-en.json |
Yoruba | English | yo-en.json |