July 22, 2020

Driving success metrics with an operations flywheel

The playbook we used to drive continuous improvement in the success metrics for the human operations team behind the Fin Assistant service consisted of a 'flywheel' process: (1) Identify Outliers, (2) Perform Root Cause Analysis, (3) Discover Correlated / Funnel Metrics, (4) Drive these Funnel Metrics with changes to process, coaching, training, tools, or automation.

Fin Analytics is a toolkit we developed over the course of years experience running an on-demand executive assistance service. We came from a consumer technology background and were surprised by the lack of analytical tools when we entered the domain of human operations knowledge work.

In consumer tech, marketing, and engineering, there are sophisticated tools–like Mixpanel, Google Analytics, and New Relic–that help you find the biggest opportunities for driving change in the high level success metrics you care about. We found nothing like this in the world of human operations, so we built Fin Analytics.

The playbook we used to drive continuous improvement in the success metrics for the human operations team behind the Fin Assistant service consisted of a ‘flywheel’ process: (1) Identify Outliers, (2) Perform Root Cause Analysis, (3) Discover Correlated / Funnel Metrics, (4) Drive these Funnel Metrics with changes to process, coaching, training, tools, or automation.

Our approach was informed and inspired by the instrumentation / profiling process for debugging a broken or slow piece of software.

0. Define Success Metrics

Most operations teams (especially front office teams) have a relatively standard set of KPIs or high level success metrics. These include some sort of metric to capture the quality of customer experience like CSAT or NPS. There are latency metrics like Resolution Time and Wait Time. And there are efficiency metrics like Avg Handle Time and Close Rate for individuals and for teams.

CRM software services like Zendesk provide comprehensive reports of these standard high level KPIs, which answer questions like:

  • How is your organization doing this quarter vs last?
  • How are the holidays affecting customer wait times?
  • Who on your team needs help and training?

What these high level KPIs do not tell you is:

What is the next best tactical opportunity to drive improvement in your organization?

You are left in a state equivalent to knowing that your iOS app is slow or the overall conversion rate on your checkout is 20%.

What you need to prioritize what to do next is a map of root causes and agent behaviors behind the high level metrics you care about. Fin Analytics is a profiling tool designed to give you this information about how to prioritize where and how your organization should invest in the next set of improvements.

QA Focus

Here is the process we have seen work over and over for using deep analytics to drive high level performance improvements in operations organizations:

1. Identify Outliers

First, identify outliers. Our pithy anecdote about operational metrics is that:

Averages are not that useful. In complex systems comprised of diverse actors, it’s all about distributions.

Averages might tell you that your team is doing better (or worse) this week compared to last, but they don’t tell you why.

Slicing data across dimensions (like agent tenure, workflow type) and then looking at distributions and outliers is the first step to understanding the problem.

Once you have segmented your data, identify the outliers relevant to the high level KPI you are analyzing.

Handle Time Outlier

For efficiency problems, this might be tickets that took >2x the avg handle time for tickets of a certain case type.

For quality issues, this might be looking at tickets where customers gave a low score on a CSAT survey.

2. Perform Root Cause Analysis

Once you have a set of specific outliers, you can compare them to the normal cases and form hypotheses about the root cause of a problem:

  • Is it because of an update you made to your internal tools?
  • A new process rolled out to handle a particular workflow?
  • An incoming class coming online that shifted the average tenure of your agent population in a meaningful way?

You may be able to use pure quantitative analysis to confirm some of these hypotheses (for example, a change in average agent tenure affecting top level metrics). But often, the data alone does not answer the question.

This is where qualitative analysis is helpful.

Why did a certain ticket take 40 minutes to complete when the average for that ticket type is 10 minutes? If the answer is not in the data, you might look at the CRM artifact (ie, the transcript) to try to reverse engineer things, but this also often does not shed any additional light on the problem.

This is exactly the scenario where full workday screen recordings are incredibly helpful. With Fin Analytics, you can just search by ticket id to find all instances where that ticket was worked on, and watch the ‘play by play’ of how each agent handled the ticket.
You can see things like:

  • The recommended resource or process was out of date, unclear, or incomplete.
  • The agent did not follow the recommended steps for solving the problem.
  • The agent second guessed their answer and rewrote the response to the customer several times, but the first version was just fine, and they could have saved 10 minutes by having more confidence in themselves.
  • The internal tools the agent used were broken or slow.
  • The agent asked the manager on call for help on Slack and had to wait 10 minutes for an answer.

None of these root causes will show up in out-of-the box CRM metrics, so most teams resort to periodic agent shadowing to uncover these problems; however, this (1) is incredibly inefficient, because you spend most of the time reviewing the average case vs the outlier case and (2) introduces a ton of lag into process change because it’s happening periodically, not as part of a real-time feedback loop.

3. Discover Correlated Funnel Metrics

Once you have identified the root cause, the next step is to figure out how to measure the correlated behavioral patterns to the outcomes you want to encourage or prevent.

This might mean tracking things like:

  • Does an agent consult the correct resources when handling a ticket of a certain type?
  • Is each agent using all of the time saving internal tools available to them?
  • Are agents making use of canned responses at every available opportunity?
  • Are you measuring agent attributes for case types with high variance (eg, correlating agent tenure to CSAT or handle time)? Should you be incorporating these attributes into your work routing?
  • Do all of your case types and scenarios have workflows and playbooks?

4. Drive Funnel Metrics

Once you have identified the root causes and the metrics to track them, there are a few types of changes you can make to drive these in the right direction (and consequently drive the top level KPIs they funnel into):

  1. Process Change. The root cause of high variance in a workflow (or ticket type) might be incorrectly bucketing 2 distinct workflows into just one–so, you need to correctly break this into 2 distinct workflows, each with its own explicit instructions. Or, maybe you just need more coverage of canned responses for scenarios within a workflow. Or, maybe you need to lower the latency of the floor manager’s response time to help line agents.
  2. Coaching. Operations work is ‘human-in-the-loop.’ Even with the best tools and processes, the best people can still make mistakes. This can happen even with veterans. Maybe they are accustomed to doing things the way that was right 6 months ago, but need to update their workflow to take advantage of the latest process and tooling.
  3. Training. In high growth companies, ops / cx orgs might be onboarding new classes of dozens of agents every week or two. You might double your workforce in less than a quarter. With a large percentage of low-tenure agents (and perhaps a large number of first time managers and trainers), it’s incredibly important to measure training practices and ensure training process is constantly improving. Every agent class should have a shorter ramp time to reach the expected quality and efficiency KPIs (their performance curves should asymptote earlier and earlier in their tenure).
Improving Training
  1. Tools. Even for teams using an out-of-the-box CRM like Zendesk, agents often use internally built admin tools to access and manipulate customer data (eg, for looking up orders or processing refunds). Watching video of tools usage may reveal critical bugs or severe usability issues that agents must work around in every case or in edge cases. Recording full video streams means your engineering team can debug these cases without wasting ops’ time asking them to reproduce bugs.
  2. Automation. Video and activity patterns can also reveal opportunities for automation or partial automation. On the spectrum of full automation might be a heuristic for auto-approving refunds based on certain attributes of a customer or of an order, totally bypassing human agent interaction. Partial automation might entail better canned responses that templatize dynamic data about a customer or order, or browser plugins or tools improvements that reduce 10 clicks to a single click.

5. Rinse and Repeat

After fixing the biggest opportunity–just as with performance profiling in software engineering–the next step is to rinse and repeat. Find the next highest variance workflow or set of outliers, and repeat the steps of inspecting these outliers, identifying the root causes, developing correlated funnel metrics, and driving those metrics with changes in process, training, or tools.


If this ‘flywheel’ process for continuous improvement sounds like what your organization wants to do more of, please get in touch about starting a pilot of Fin Analytics!

May We Suggest