Building Hybrid Intelligence Systems

Kortina, 20 Nov 2017


Fin 2017 Year in Review // Lessons Learned

We started Fin with the premise that mixing human and software intelligence could produce a service better than any pure software ‘assistant’ like Siri or any individual human. (Btw, if you’ve never heard of Fin, it’s a service that provides high quality, on-demand assistance.)

Along the way, we’ve discovered that while hybrid intelligence systems can give you the best of both worlds, they can also give you twice the challenges you might have when dealing strictly with humans or with software alone.

We have learned a ton in the past year, and wanted to share some of the key lessons that have given us more confidence than ever that in the near future most work will be performed by human + software hybrid systems like this one.

Sections

  1. Shared memory tools make teams smarter and better than any individual can be alone.
  2. Checklists help even the best humans get things right.
  3. Using personal context to do the ‘right’ thing for each customer is table stakes for doing work as high quality as a human assistant would.
  4. Leverage data to reduce variance in human systems.
  5. Computers are better at math than humans.
  6. Humans are the universal API.
  7. Closing thoughts: hybrid intelligence systems should outperform pure software and isolated individual humans.

1. Shared memory tools make teams smarter and better than any individual can be alone.

We operate 24 x 7 x 365, and there is no guarantee that every request from a particular user (or even each part of a single request) gets routed to the same human operations agent on our team, so we can’t rely on any knowledge being ‘cached’ in a person’s brain.

And, because Fin is totally open ended (customers can ask for anything from “can you send flowers to my mom?” to “can you investigate how much it would cost to buy this racetrack?”), we cannot possibly train everyone on our operations team on every kind of request a customer might send in — we can’t even enumerate all the kinds of things someone might ask.

Consequently, we have invested deeply in tools for sharing knowledge about each customer, about each specific request as it gets handled by many people throughout its lifecycle, and about each kind of request the first time we perform an instance of it (eg, buying flowers).

There is some upfront cost to maintaining this ‘shared memory’ for our operations team, but this year we have started to realize many of the advantages we hoped it would pay off with some scale:

(i) Because there is not a 1:1 mapping between customers and operations agents, your Fin assistant never sleeps, gets sick, or vacations and is available 24 x 7 x 365.

(ii) Likewise, Fin can work on many requests in parallel for you, unlike a single human assistant.

(iii) While an individual human assistant only knows what they know / what they learn, Fin’s shared memory entails that any time one agent on the Fin team learns something new about the world or about a best practice, everyone else on the team instantly ‘learns’ or ‘knows’ the same thing, because the knowledge is encoded in our tools. So, for example, when one agent learns a new phone number for making reservations at a tough to book restaurant, or a more efficient way to book rental cars, or a great venue for a company offsite, or a cheaper place to get a certain product, every other agent and every customer of Fin benefits from this knowledge.

These network effects that result from our shared memory approach make the Fin team collectively more knowledgeable than a single individual member could be on their own.

2. Checklists help even the best humans get things right.

All of the knowledge our operations team indexes about best practices, about customers, their preferences and relationships, and about the world is useless if we cannot find it and apply it at the right time.

Because Fin’s shared memory is constantly changing, we cannot simply train operations agents on everything before they start working. So, we store this information in a wiki-like format with a search index on top of it, where any agent can lookup a document with the best practices for any kind of request or find relevant information about a customer on the fly.

This database and search index are not sufficiently reliable on their own, however, because it is easy for someone to miss a key step in a particular workflow when they only read through a how-to document. Or, even if they had thoroughly learned a workflow at some point, their knowledge may have become stale.

Over the past year, we have migrated much of our process knowledge into checklists, which, as The Checklist Manifesto famously describes, help even the most highly skilled professionals like surgeons and pilots dramatically reduce error rates. Our checklists ensure that operations agents do not miss any key steps in known workflows as they handle requests.

But, while surgeons and pilots know ahead of time a few specific checklists they need to use, we have hundreds of checklists comprised of thousands of steps because of the breadth of work Fin does. This means that finding the right checklist to use (if one exists) is another problem we need to solve.

In addition to curating the content of checklists, our operations team is also responsible for managing the rules around what checklists to use when. They author these rules in a variant of javascript we call Finscript.

img

We have some NLP models that tag each request, and because our operations team has mapped each tag to the relevant checklists with Finscript rules, by the time a human agent picks up a request, the relevant checklists are already in front of them, so they don’t have to search for the correct checklist to use (or even be aware that it exists).

img

3. Using personal context to do the ‘right’ thing for each customer is table stakes for doing work as high quality as a human assistant would.

Probably the most critical type of knowledge stored in Fin’s shared memory is user context.

When you work with a great person for many years, this person gets to know all of your nuanced preferences — your communication style, price sensitivity, prioritization framework, when to take initiative vs. confirm something, the important people and places in your life, etc.

Acquiring this sort of deep knowledge of each customer is table stakes if Fin is going to be as good as (or better than) a human assistant. (As an aside, the atomic treatment of every request without any sort of memory / context is one of the most frustrating things about talking to many pure software voice assistants currently in market — you can’t yet say, “Alexa, can send Rob a link to that bucatini I ordered last week, and get me a few more bags?” Instead, you need to fully specify the parameters of every request each time you interact with Siri or Alexa.)

Over the years, Fin has gotten to know many dozens of nuanced preferences about me, like:

  • Default duration for work calls: 25 minutes.
  • Default spot for morning coffee meetings: Jackson Cafe.
  • Do not make reservations at seafood restaurants — Kortina does not eat fish.
  • When booking Barry’s Bootcamp, book Treadmill first.
  • Use Amex ••6000 for all purchases.
  • etc, etc, etc…

Likewise, Fin has learned tons of other important context about me that you wouldn’t necessarily call ‘preferences’

img

It is by storing all of this knowledge in Fin’s shared memory that any agent who picks up a request from me can know that when I say ‘Rob’ I mean ‘Rob Cheung’ or when I say ‘nopa’ I mean the restaurant and not the neighborhood. (Relevant user context and preferences are, btw, surfaced automatically with Finscript in the same way that relevant checklists are, so an agent does not need to know to look for them.)

All of this context is just as critical to Fin doing the right thing when I ask for something as the preferences I explicitly enumerate to Fin as such.

4. Leverage data to reduce variance in human systems.

We have an internal principle: ‘AI’ at Fin means Always Improving. We try to design systems with reinforcing feedback loops that would self improve given no new energy injected from external sources (eg, product development, tools, talent, money).

Adhering to this principle proves difficult given the vast breadth of heterogeneous work we do.

We start to make the problem tractable by measuring absolutely everything that happens in the Fin ecosystem, but even with the many terabytes of data we’ve captured in our Redshift cluster, it has at times been difficult to answer questions like:

  • How does our quality of recommendations today compare to 4 weeks ago?
  • How does our speed on scheduling requests compare to last week’s?
  • Did we spend longer on this request than we should have?

The difficulty in quantifying answers these questions is due to request variance:

  • Category of request (eg, research vs. scheduling)
  • Complexity of request within the category
  • How clearly the request was specified by the customer
  • How many round trips the request took to complete

and also due to a vast array of other dimensions and factors like:

  • How many new operations agents were onboarded in the past few weeks? What percentage of the entire population did they comprise?
  • Was this work performed at the end of someone’s shift or the beginning?
  • Was this work performed on a redeye or a daytime shift?
  • Was the person performing this work having a bad day?
  • Were there any known technical glitches or service latency on a given day?
  • Was there some external latency (eg, an outage from an external service provider or long hold times for a new product release or restaurant opening)?

People ask us all the time (i) why we can’t charge a flat rate for requests of a given type or (ii) what the average charge is for a request of a given category.

We would love to charge a flat rate per request (and in the past we experimented with charging a flat monthly rate). The problem with this approach is that the high outliers, either in terms of request complexity or heavy usage users, drive up the averages quite high, and the effective rates for the most frequent requests or typical users would be more expensive under a flat rate model.

You can see this by looking at some of the complexity distributions for work we do, both in terms of number of round trips or working sessions to complete a request, or in terms of minutes spent per working session, across a few common categories. In all of these cases, the average falls way above the median.

img

img

What’s not great about this from a customer perspective is that it makes the price for any given request unpredictable: historically, your price would vary based not only on the complexity of work, but also all of the external and environmental factors I listed above, like who specifically worked on your request and what kind of day they were having.

One of the areas we invested deeply over the past year was making pricing more predictable by smoothing out variance due to all these environmental factors. By studying thousands of requests of different types and complexity, we recently were able to update our pricing to essentially charge for the average time we have observed that a request should take, removing environmental factors like who did it and when.

This has resulted in much smoother, more predictable pricing, which now is mainly a function of the complexity of work requested.

5. Computers are better at math than humans.

This one is a bit tongue in cheek, but worth mentioning since we frequently talk to people outside the company who think that Fin operates entirely on some extremely sophisticated NLP / deep learning / black box software that can perform far more complex tasks than Siri.

In reality, while we have a few language models running that automatically categorize requests and analyze customer sentiment, the models that are the most fundamentally valuable to our business are more numerical.

One example is the model I mentioned in (4), which looks at all the work that happened on a request and determines the complexity in terms of how many minutes the work should have taken to complete. Before we had this model, we had a purely manual quality review process, part of which asked the question, “how long should this request have taken to complete?” Having reviewed hundreds of request transcripts myself, I can personally attest that our model is far more accurate (and far faster) at answering this question than I am.

Another place we lean heavily on software is scheduling. We have one set of models that predicts how much work do we expect customers to demand for each hour of each day of the next 4 weeks. Then, we have another model that takes this as input, along with other parameters describing a pool of operations agents available for work, each with unique constraints and preferences. This second model generates the optimal schedule given all of these inputs, and does a far better job at it than when one of our operations leads tried to do this with an Excel spreadsheet back in the early days.

6. Humans are the universal API.

This one is worth mentioning, because it’s the main reason Fin is able to handle such a wide breadth of tasks.

While we would certainly love to automate things like purchasing things or making restaurant reservations, there are no programmatic APIs that we could use to do most of the work that our customers ask for. Not even all of the restaurants we book are on OpenTable.

One of the things that makes Fin as capable as a human assistant is that we can use the public internet, we can email or text anyone, or we can pickup the phone and call people to get things done for you.

img

About 60% of requests involve emailing someone, 10% involve a phone call, 2% involve sending an SMS, and nearly every request involves using some internet service outside Fin.

This fact alone entails that there are a huge number of things Fin can do that a pure software assistant won’t be able to do for a very long time, if ever.

7. Closing thoughts: hybrid intelligence systems should outperform pure software and isolated individual humans.

We believe that hybrid systems that leverage great software and a network of humans with shared memory are the future of work. These hybrid systems should outperform what either pure software or isolated individual humans are capable of, on a number of vectors, like cost, efficiency, speed, availability, etc. Just as much computation has moved from self-managed hardware to networked clouds, we believe many other types of work today performed by individuals will migrate to hybrid systems like Fin in the coming years.

Looking back over the past year has been a fun exercise for us, and hopefully gives you a bit more of an understanding of what you can expect from Fin if you’re already a customer (or inspires you to give it a shot if you have not already).

If you’re interested in trying out Fin as a customer, signup here.

If you’re interested in joining our amazing operations team, apply here.

If you’re interested in joining our engineering team, apply here.

If you’d like to come meet our team and talk about hybrid intelligence systems at our holiday party, rsvp here.