You are here

Natural language processing in insurance

Author: Dr Alexey Mashechkin

Next in the series of articles on Data Science.

Find all articles and further information on the Data Science MIG page.

Introduction

Search engines, virtual assistants, spam filters, chatbots and translation tools – all of them sooner or later deal with text processing. That, in particular, explains why Natural Language Processing (NLP) has gained such a big popularity in recent years. The insurance industry generates tons of paper and digital documents daily so why not benefit from it instead of being buried under a giant pile of text?

NLP is a subsection of machine learning devoted to teaching computers how to properly recognise what lies behind words, i.e. the meaning of the input sentences, and help to make conclusions based on this. As with the whole of data science, the idea of smart language processing is not new: the first attempts to use it were performed as early as the 1950s. But thanks to the exponential improvement of computer performance over time, some heavy calculations are now easily accessible without renting a supercomputer.

Importance and difficulties

Taking into account the speed at which information spreads through social networks and other web-based channels, a poor client experience can zero a company’s reputation tremendously quickly. Using NLP, one can parse thousands of online reviews, detect mood vectors and provide early warnings and advice to a company on any changes and their drivers.

Expenses are another big topic. Billions are being spent annually on interaction with clients, beginning with the first contact and ending with product support. Quite often this complicated and heterogeneous path can be optimised and accelerated by NLP, for example by automating a policy purchase and further interaction with a client through a smart chatbot. That is not only money saved but also leads to a better client impression of the company and provides employees with more time to focus on their primary tasks.

All in all, one can benefit a lot (you will see a few use-cases further in the article) from a correctly implemented NLP technique.

On the other hand, currently there are some serious constraints for a full implementation of NLP. The Holy Grail of NLP researchers is Natural Language Understanding (NLU): a computer should be taught how to read and understand a text in a human way with all its subjects, relationships, desires, aims and other features. But no model exists nowadays that can take care of all these details.

A multilanguage environment is another issue. Did you know that in Africa alone there are more than 1,000 languages? Let’s imagine you are running text analysis for an international company with offices and clients located all over the world, from Toronto through Ashkhabad to Osaka. You may find a dozen languages with different semantics, character sets, and grammatical rules are being used to describe the same facts.

Illiteracy, words adopted from other languages, slang and other newly digested forms are also not making NLP tasks easier. If you think you have trouble understanding the latest text-speak, pity the poor computer!

Methods involved

Let's take a look at the most popular methods used in NLP and some of their components. It's easy to see that they are actually strongly interlinked with each other and create a common environment.

Syntax

Takes care of recognising/constructing a correct grammar structure through:

  • Lemmatisation – reducing words to their morphological root (i.e. dictionary) forms
  • Stemming – reducing words to their stems
  • Part-of-speech recognition – tagging each word to indicate, for example, whether it is a noun, verb or adjective
  • Sentence breaking and word segmentation – a dot doesn’t always represent the end of the sentence and words in Chinese don’t have spaces between them
  • Terminology extraction – recognition of terms for indexing, new expressions monitoring and so on

Semantics

Refers to the interpretation of words and sentences. Includes such approaches as:

  • Natural language generation – converts information stored in computers into readable language
  • Natural language understanding – transforms language into commands, understandable by computer
  • Named entity recognition – detects particular objects (locations, names and so on) in a given text
  • Sentiment analysis – determines the mood associated with text
  • Topic recognition – grouping text by its main subject

Speech

Automatic recognition of pronounced words and, conversely, transformation of text into speech.

Discourse

Includes text summarisation, recognition of dependent objects and classification of relationships between them.

There are a lot of libraries and packages dealing with smart text processing with NLP. As starting points for getting into NLP coding, you can take a look on spaCy/NLTK if you prefer Python or tm/OpenNLP in case you write code in R.

It’s important to mention here, that language coverage is slowly improving - for instance, sPacy supports 8 and NLTK’s stemmer Snowball can work with 15 languages including Finnish, Portuguese and Russian. Also, some scientific results, with academic papers and associated code, can be found at the NLP-progress portal.

Insurance examples

So how can NLP boost the performance of the insurance market? Let’s jump from theory to practice and take a look on a couple of real-life implementations:

Lemonade. Most of you probably have heard about this start-up, operating online only and founded by people from the IT world with zero insurance experience. According to Lemonade's statement, you can get a new policy within 3 minutes and receive a payment 1.5 minutes after a claim submission (their bot holds a record of 3 seconds spent on reviewing and paying the loss). Take a look on how a policy adjustment request is handled through their chatbot Maya.

ReacFin. This Belgium-based consultancy with actuarial roots used NLP to develop a tool for French CCR Re, which transforms unstructured digital and image reinsurance treaty data into a structured dataset (limits, Lloyds references, exclusions and so on).

Gamalon. A natural language AI platform focused on automated communication with customers, analysis of their support tickets and feedback from open-ended surveys.

Health Fidelity. Produces risk adjustment tools for insurers, trained on thousands of medical documents and health insurance claims. The latter have a flag showing if a claim was fraudulent or not, which helps insurers to determine fraud among their own clients.

Accenture. This consultancy developed a software product named Machine Learning Text Analyzer (MALTA). It promises to dig into all the incoming text an insurer receives through various channels – emails, chats, HR tickets, support forms, etc – analyse it, classify it and trigger specific processes set-up for a particular subject or point the data flow to the correct agent.

Taiger. Created a virtual assistant to take care of clients’ onboarding (including policy choice and recognition of uploaded documents) and claims handling (including recognition of claim application forms and extraction of important information from them). The company claims 75% reduction of total costs was achieved after deployment of their tool at “one of the largest insurance providers in Europe”.

NLP can be of help in detecting claims that are potentially liable to subrogation, social media analysis in order to get early insights on claims from the company’s portfolio (especially useful for corporate insurance) and many other tasks involving various forms of text as an object of analysis.

Is it worth the hassle?

According to MarketsandMarkets’ research, the NLP market will keep increasing and by 2021 will be worth USD$16 billion.

In turn, insurance companies that are capable of controlling and analysing the continuously-growing pool of unstructured data will certainly gain a strong competitive advantage in conquering this industry.