There is a lot going on behind the scenes of our solution for customer support teams. You give us your unstructured data in the form of support conversations in email, message and chat and we show you why your customers are calling your support. But how did we do that? What was the process that got all these thousands of messy tickets into actionable data that you can use to improve your customer’s experience and support?

Natural language Processing

This is achieved through Natural Language Processing (NLP), a form of artificial intelligence that focuses on analyzing human language to draw insights, automate workflow and better help customers.

In the past, the main issue surrounding the inability of humans and computers to interact seamlessly was a language barrier. Machines do not speak the same language as us. They understand binary code, which is a series of millions of ones and zeroes that instruct a computer in completing their tasks. Systems have been set up where, with the press of a button or the click of a mouse, that computing language is relayed at the speed of light, allowing machines to understand our wishes.

This has all changed with the birth of NLP, a very advanced form of AI that has only recently become viable. NLP uses a variety of techniques to understand human language including a process called deep learning. Deep learning has artificial intelligence look at data patterns to deepen its understanding of language. Huge amounts of labelled data are inputted to help the system identify relevant correlations. Language is broken down into shorter elemental pieces in order to teach the machine to understand their relationships and how they work together. By doing this, the computer can ascertain the meaning behind a sentence.

Using natural language processing and machine learning our AI is able to read support tickets and understand a customer’s intent. We all speak differently and there are plenty of ways to say and mean the same thing when we communicate with each other. This means artificial intelligence needs to be pretty intelligent to figure out all of the different ways there are to say something. So let’s look at some of the steps Natural Language Processing takes to get to this point.

Tokenization

Tokenization is the process of breaking a stream of text up into words, phrases, symbols or other meaningful elements called tokens. The list of tokens becomes input for further processing. This is often done in English by looking at punctuation and spaces. This sounds easier than it is, however, as punctuation marks are often ambiguous. For example, a period may denote an abbreviation, decimal point, an ellipsis, or an email address – not the end of a sentence. About 47% of the periods in the Wall Street Journal corpus denote abbreviations.

When punctuation and similar clues are not consistently available, the segmentation task often requires fairly complex techniques, such as statistical decision-making, large dictionaries, as well as consideration of syntactic and semantic constraints.

Lemmatization

This is an algorithmic process of returning different forms of a single word to its root form based on its intended meaning. This means the algorithm must correctly identify the intended part of speech and meaning of a word in a sentence as well as in the larger context of the surrounding sentence. Examples of this would be the words ‘walked’, ’walks’ and ’walking’ which will be transformed into the common base form walk by the algorithm.

Topic Modeling

This helps machines understand the themes and meaning within a collection of text. They then take that meaning and approach it from an advanced analytical standpoint. Put simply, given that a support ticket is about a particular topic, you would expect particular words to appear in the support ticket more or less frequently: ‘bark’, ‘bone’ and ‘fetch’ will appear more often in documents about dogs, ‘meow’, ‘mouse’ and ’whiskers’ will appear in documents about cats, and “the” and “is” will appear equally in both.

A document typically concerns multiple topics in different proportions, so, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents or support tickets and discovering, based on the statistics of the words in each, what the topics or themes might be and what each document’s balance of topics is.

Semantic Analysis

The process of understanding natural language (the way that humans communicate) based on meaning and context. The semantic analysis of natural language content starts by reading all of the words in content to capture the real meaning of any text. It identifies the text elements and assigns them to their logical and grammatical role. It analyzes the context in the surrounding text and it analyzes the text structure to accurately understand the proper meaning of words that have more than one definition.

So, our AI not only looks at the words that a customer uses but also the context of those words and how the sentence around them is structured. This means that even if a customer uses a word, phrase, sentence or some slang that the machine has never seen before, it is able to understand what you are trying to say. Just as you might be able to figure out what I mean when I say “I’m making a serious amount of cheddar at my new job”. You may not have heard the term ‘cheddar’ before but I’m guessing you figured out what I meant (psst..it was slang for money). You were able to look at the context with which the slang term was used and see that if someone was making a lot of something at their new job, the likelihood is they meant money rather than cheese in this particular context.

Conclusion

Turning vast amounts of unstructured text in the form of your support tickets into actionable data that you can derive insights from is not an easy task. There is a huge amount of work that goes into understanding spoken language, and hopefully you now better understand a few of the steps that can be taken to get this raw data ordered and structured into a way that you can start to make real improvements to your support and customer experience based on evidence and data.