We started assessing Google AutoML a good while ago, to see how it would complement our own internal ML, AI and Natural Language processing technologies. We were a little doubtful at the start but decided to give it a go, simply because Google so often does things really right.

By the way, a little word of warning before you read any further: this is NOT an academic or scientific article about Machine Learning or AI. You will NOT find complicated words and concepts that expect the reader to have a PhD in neural networks. If you are looking for this, you can find excellent articles in fastAI or Medium and many other places. What we are trying to do here is explain, in layman terms, how these new Machine Learning approaches, such as Google AutoML, can actually help end users achieve specific goals.

In our case, our “end-user goal” for Google AutoML is to analyse large amounts of customer service interactions, in the form of Zendesk Tickets, Freshdesk tickets, Zopim Chats, or Intercom conversations, to determine and trend the reasons why customers are contacting the support team. This is an area we understand well, as all our customers are Cx managers who know they cannot rely on tags manually applied by agents, hoping to be able to run a report at the end of the week and see which customer issues are trending. Manual tags are invariably wrong, or not applied at all, and the reports are wrong.

So what we are doing here is use Google AutoML to automatically classify, categorise Zendesk, Freshdesk, Zopim and Intercom interactions to determine why customers are contacting the customer service team. Sounds simple enough? Well actually, it is.

The process we went through is in 5 steps:

1. Select a large set of tickets or chats from Zendesk, Freshdesk, Zopim, Intercom.

2. Annotate them manually: what this means, in layman’s terms, is manually applying a label to each customer query and generate what we call a “training dataset”, to teach Google AutoML how to adapt its model to classify this dataset accordingly. For example, after applying the label “Order_not_received” to hundreds of sentences such as “Did not receive my order”, “Parcel never arrived”, “no delivery today”, we hope that the next time we show a sentence like “I did not get my parcel”, Google AutoML will recognise it and apply this label.

Cx MOMENTS facilitates and accelerate this greatly, with our AI-assisted categorisation, which pre-filters candidate sentence for classification into a label.

3. Review the Training / Test results: Google AutoML selects about 10% of your training data as “test data”. This means it will train its model on 90% of your training dataset, and use the 10% remaining to verify that its model is right. You then get test statistics such as Recall and Precision. If you don’t know these words, “precision” means basically how accurately you classified each interaction, and “recall” means how many of these you managed to categorise.

4. Bring the model to live: If you are happy with the training results, then you can bring the model to live in production to automatically categorise new customer interactions.

5. Improve the model: This is a key part of Machine Learning. Once you think you have arrived, you are actually only starting! The model is never perfect to start with. And things change…. So as users start actually “using” Google AutoML classification in a production system, they will invariably find problems: this customer interaction is about something else, or that one was not classified at all. Google AutoML makes it relatively simple and very effective to apply these corrections and constantly make the model better. All you need to do is manually tag these few examples with the right labels, and feed this back to Google AutoML as an incremental training dataset. The model will retrain, and start producing better results. We have gone through a few iterations like these, and the results are very impressive.

So what is the good and the bad about Google AutoML?

The good:

– Google AutoML uses Transfer Learning. In layman’s terms, this means it is already trained on a large amount of data, and only needs an incremental amount of your data to adapt its model to your use case.

– The workflow is relatively simple to use: a UI lets you annotate manually, or load a training data via CSV file.

– API is well documented and once a model has been trained, makes it relatively easy to classify new sentences tickets in production.

– Test results/scores are clear and well presented.

The bad:

– Google AutoML workflow is limited: training data should be injectable via API, not just via CSV file.

– The UI is limited: we actually used our own UI to annotate and generate the training dataset.

– The test results & scores should be made available via an API, so they can be re-used in customer-facing systems.