Since August 2017 I have been a PhD student in the Center for Language and Speech Processing (CLSP) at Johns Hopkins University (JHU), as well as an Associate Research Scientist at the JHU Human Language Technology Center of Excellence (HLTCOE). I am advised by Philipp Koehn and funded by a National Defense Science and Engineering Graduate (NDSEG) Fellowship.

My work focuses on natural language processing (NLP), especially machine translation (MT), including parallel data curation, automatic MT evaluation, multilingual MT, low-resource MT, and domain adaptation.

I developed Vecalign, an accurate sentence alignment algorithm based on multilingual sentence embeddings which is linear in complexity with respect to the number of sentences being aligned. In conjunction with LASER, Vecalign works in about 100 languages (i.e. 100^2 language pairs), without the need for a machine translation system or lexicon. As of November 2019, Vecalign has the best reported performance on the test set released with Bleualign.

I also developed Prism, an automatic MT metric which uses a sequence-to-sequence paraphraser to score MT system outputs conditioned on their respective human references. Prism uses a multilingual neural MT model as a zero-shot paraphraser, which eliminates the need for synthetic paraphrase data and results in a single model which works in many languages (we release a model in 39 languages). Prism outperforms or statistically ties with all metrics submitted to the WMT 2019 metrics shared task at segment-level human correlation.

Prior to starting my PhD I was in the Human Language Technology group at MIT Lincoln Laboratory.


