AV

Computer science expert using natural language processing to improve equality in language technologies

Body

Computer science researcherAntonis Anastasopoulosuseshis love for computer science, language, and linguisticstoimprove equality in language technologies.

When people ask Siri, Alexa, or GoogleAssistanta question, they expect the programstounderstand them, but that is not always the case, he says.

Antonis standing outside, facing the camera.
Antonis Anastaspopoulos, photo provided.

A person’slanguage, accent, dialect, and even gender can have an impact, preventing the system from interpreting them correctly, says Anastasopoulos, an assistant professor in the and an expertin natural language processing, which ishow computers attempt to process and understand human languages.

“The systems don’t work equally well for everyone,” saysAnastasopoulos,whospeaks Greek (his native language), English, German, Swedish, Italian, and some Spanish.

Heisone of several co-principal investigators who received a new National Science Foundation-Amazon grant for their research, “Quantifying and Mitigating Disparities in Language Technologies.”

In the fall,Anastasopoulosalso won a  for a project on how accentanddialectimpactlanguage technologies.

For the NSF grant, he and experts from Carnegie Mellon University and the University of Washington arestudyingareaswhere there is biasin language technologies andmeasuring the discrepancies. Thentheywill attempt to mitigate theinequalities.

“We want to measure the extent to which the diversity of language affects the utility that speakers get from language technologies,”Anastasopoulossays.“We will focus on automatic translation and speech recognition since they are perhaps the most commonly used language technologies throughout the world.”

Hisresearch will apply to all languages.It’simportant to look deeply into languages for differences because languages are flexible and diverse, he says. “There are many regional variations that are different from the standard.”

He also recently received a $350,000grant from the National Endowment for the Humanities (NEH) to build optical character recognition tools to convert scanned images of text to a machine-readable formatfor endangered languages.

“We areworking on training machine-learning models toprocessimages and texts inthebooksand documentsof indigenous languagesfromcentral and South Americaso thattheseworkscan bemadeaccessible to everyone,” he says. “We are building technologies to study those languages computationally.”

Anastasopoulos isalsopart ofa prestigiousgroup of machine-translation researchers, including expertsfrom Facebook, Google, Amazon, and Microsoft,who arecreatingautomatic tools that translate COVID-19-related contentforcommunitieswhere peopledon’tspeak the languages most often used by large health organizations, includingthe World Health Organization.

“We are working closely with Translators without Borders. Sofar, we have produced terminologies formore than200 languages and a large dataset for 35 languages, some of them extremelyunder-served bythecurrent solution.”