Measuring Work-Mixture Changes using Jensen Shannon Divergence

Jon Simon, 4 Nov 2018

Here at Fin, we do so many different kinds of work for users, from scheduling haircuts to researching vacations, that a common question that arises is "Are we doing the same mix of work now as we were doing a few weeks ago?"

While it is obvious when our work mixture changes dramatically, such as occurred in mid-September when we made it much easier for users for sign-up for weekly phone-syncs (light-blue region in Figure 1), most of the time the changes are much subtler.

We can think of our work mixture in any given week as a probability distribution p(x) across all types of requests that we handle. So for example, it might be that 9% of requests are to schedule a meeting, 3% are to book a flight, 5% are to make a restaurant reservation, etc

Figure 1. Distribution of user requests by work type.

In this case we can rephrase the question about how much our work mixture is changing over time as "How far away is the current probability distribution from the probability distribution from a few weeks ago?"

As it turns out, there are many ways to measure the distance between two probability distributions, but in cases such as this where work-types may be added or removed*, and where there isn't a straightforward way to quantify the "closeness" of individual work types**, then the natural distance measure is the Jensen-Shannon (JS) divergence.

Given two probability distributions p(x) and q(x) the JS divergence between the two distributions is defined as

where KL(p || q) is the KullbackÔÇôLeibler (KL) divergence, which can be understood as the amount of additional information required to specify observations drawn from the distribution p(x) if we base our coding scheme on the distribution q(x). (For additional intuition about the KL divergence, refer to one of the many, many, many explanations available online.)

The JS divergence is a symmetrized version of the KL divergence, where we are attempting to describe both the distribution p(x) and the distribution q(x) using a "blended" distribution m(x).

Representing p(x) and q(x) as arrays, we can compute the JS divergence in Python as:

Comparing the distribution of work each week to the distribution at the start of July, the surge in phone syncs in mid-September clearly stand out, however we also notice a more subtle shift in mid-August, which corresponds to when we began catering more heavily to business users.

Figure 2: JS divergence of work-mixture compared to July 1st.

If you thought this was cool, and would like a chance to dive into our data yourself, you should apply to be a data scientist at Fin!


* If work-types were only added, and not removed, we could use the KullbackÔÇôLeibler (KL) divergence

** If there existed a natural way of measuring distances between work-types, we could use the Wasserstein metric