Fellowship Highlight: Improving Understanding with Tarjimly

Statecraft by Arman Madani
deltanalytics
Published in
5 min readJul 28, 2019

--

Delta Analytics is an SF Bay Area based 501(c)3 nonprofit which seeks to use Data for Good through our nonprofit data service grant and education programs. You can learn more about our programs here. If you are interested in supporting our programs we accept financial contributions here. Please direct all other inquiries to inquiry@deltanalytics.org.
Arman Madani was a 2019 Data Fellow and is a Data Analyst at Affirm. This article is also featured on his personal blog:
ArmanMadani.com.

The 2019 Data Fellowship came to a close on July 19th with many excellent, impactful projects being showcased. One of the non-profits partnering with Delta Analytics was Tarjimly (“translate for me” in Arabic), a translation app that pairs refugees/immigrants with volunteer translators on-demand. To date, Tarjimly has helped nearly 20,000 refugees and recruited 9,000 volunteers. Thomas Vetterli, Terence Tam, Ugaso Sheik-Abdi, and myself were tasked with helping Tarjimly accomplish the following:

  1. Decrease the time it takes to match translators/aid workers with refugees/immigrants (Baseline: 130 seconds)
  2. Improve the successful match rate (Baseline: 50%)

Below, we’ll outline the TL;DR’s of each phase of our work.

Exploratory Analysis
Over the course of the first few weeks, the team and I worked on exploring the data to build up some domain knowledge. Primarily, we explored user metadata, request data (who was notified of a request), and session data (what happened after a successful session — like ratings). A couple of the most important observations to come out of this exploration were:

  • Many translators were not utilized or underutilized
  • Timing and timezone matter a lot. Something that we overlooked initially was the importance of pinging translators at appropriate times relative to their timezone

We also got a chance to explore the market dynamics on the Tarjimly platform. That is, examining the relationship between demand for translations of a given language-pairing (e.g. Arabic-English) to the supply of translators who could speak both languages. While we ultimately didn’t use the market dynamics analysis as much in modeling, we were able to make changes to the codebase to scale the number of translators pinged in response to low supply of translators (more on that below).

Feature Engineering
This one is fairly straightforward. We were able to expand on the feature set that the Tarjimly team had provided us by deriving new features from raw data. For example, when given a date on which a translator responded “yes” to a request, we broke that date out into multiple binary variables like “is the date a week end?”, “is the date a month end?”, “is it a weekday, weekend, or holiday?”

Additionally, we condensed one-to-many relationships into one-to-one relationships. Example: if 1 user has multiple ratings (one-to-many), we took an average score (one-to-one).

Lastly, we converted all datatypes into usable formats for modeling (datetimes and strings were converted to numbers like floats and integers).

(Ok, I promise the next couple are actually more “TL;DR-ish” because they’re more abstract/been covered a bit above.)

Feature Helpers/Extractors Library
All the feature engineering work, including calculations and datatype conversions was packaged into a Python library which would help create the features (called “feature helpers”) then save/cache the extracted features (called “feature extractors”) so we can run the data through our models.

Codebase Refactors
We refactored the code to make it more efficient, track our model performance, and scale the number translators notified of a request (“translator pool”) in response to certain triggers. In terms of tracking machine learning models we implemented MLFlow, a platform for tracking the machine learning lifecycle, to track model performance. The triggers we used to scale the translator pool is based on the number of available translators for a given language pair (e.g. the number of translators who speak Swahili and Nepalese is very low compared to the number of translators who speak English and Arabic) and based on how many requests the refugee/immigrant/aid worker has made previously.

Machine Learning Models
We experimented with 3 types of models: Logistic Regression, Random Forest, and Gradient Boosted Trees. These supervised learning models took the feature set we derived above as independent variables and a response (“yes” or “no”, 1 or 0) array as a dependent variable. We experimented with 2- and 3-class models but ultimately decided on using 2-class models. The idea behind 3-class models was that a “no” response was different than a null response (the “no” response means the user engaged with the request instead of just ignoring it), but the outcomes are ultimately the same. Our Random Forest model with an expanded feature set has an F1 Score of 0.94 and Class 1 Recall of 0.56, which is really good (while most requests are either fulfilled, a large percentage of the individual pings sent out per request are ignored — this app leverages good will after all- so we had to be mindful of class imbalance). A really interesting part of this project was seeing the dichotomy between our machine learning performance indicators and practical metrics. For example, we were getting very low F-1 scores (in the 0.2–0.3 range) but as long as just one translator answered a request, that’s a successful match and match rates were great! Why? When a request is made, let’s say 20 translators are pinged, 19 say no, and 1 says yes, your F-1 score is going to be low but as long as 1+ translator says yes for most pings, your match rate will be high. So we observed match rates in the 80–85% range! Here’s a confusion matrix of the results:

Confusion Matrix of model run on validation set

Below you will find our final poster deliverable, which we presented at UCSF on 7/19/2019:

You can see the full lightning talk presentation here.

This was an incredible experience and we look forward to continuing our work with high-impact non-profits just like Tarjimly. Again, if you are interested in volunteering or making financial contributions to Tarjimly, go here!

--

--