By Frederic Valluet, Solutions Director, EMEA, Insurance & Healthcare at MarkLogic
Frederic Valluet, MarkLogicFraud is a perennial headache for insurers. Detection rates remain frustratingly low and most fraud is only noticed after the money has gone and it is too expensive to claw it back.
While there are some interesting examples about how insurers have looked at claimants’ public posts on social media to detect claims fraud – for example running a marathon while claiming for whiplash – most insurers need to start at the beginning: being able to search their own corporate data such as databases, claims forms, call centre logs and email.
It seems a no-brainer that big data should make it easier to detect insurance fraud by unearthing patterns of behaviour and finding links across volumes of data. But while much has been said about the value of analytics for filtering data and combing through it,it has been slow to take off.
The painful truth is that data scientists spend far more time collecting and preparing unruly data – 50 to 80 per cent of their time – than on the value-add of exploring the data for patterns and relationships that can help detect fraud.
This problem stems from the fact that data has evolved in silos, usually a legacy from departmental initiatives going back decades and often compounded by mergers and acquisitions over time. Furthermore, multiple copies of data spread out across silos lead to data integrity and usability issues.
Further, many businesses are still relying on yesterday’s database technology to solve today’s problems. In use for over 30 years, relational database management systems (RDBMSs) appear to be here to stay, but it’s clear that a long history doesn’t compensate for their lack of agility.
When relational databases first started coming into use, the corporate world didn’t have to take into account data types such as PDFs of insurance forms, social media and video. Relational databases simply don’t work for many of today’s data types, resulting in the need for time-consuming and expensive data modelling and extract, transform, load (ETL) processes. Using these old-school databases,insurers face what might seem to be insurmountably complex data integration challenges.
But there is an alternative approach that can deliver on the promise of big data – connecting data from silosto provide a single, unified view. Using an operational data hub built on an Enterprise NoSQL database that supports both structured and unstructured data, insurance firms can integrate all of their data from silos to deliver a solution that is specifically designed for today’s rapidly-changing, multi-structured data applications.
An operational data hub is a virtual filing cabinet that creates a single unified view of all claims data, whether structured or unstructured. Once this 360-degree single view is enabled, it is easy to search for connections between claims and to link different data sets.
Choosing the right Enterprise NoSQL database for the operational data hub is key. Features on the checklist include integrated search, a Google-style search across all corporate data and semantics capabilities. Semantics enables the discovery of facts and relationships in data, and provides context for those facts. Another requirement is enterprise-grade ACID compliance.With ACID support, even the largest datasets are processed consistently and reliably so none of the data is ever altered or lost. Importantly, due to the scalability and agility of NoSQL, the system can also be quickly adapted, extended and enhanced to meet changing business and regulatory requirements.
With an operational data hub it is possible to evaluate the claim and its context, comparing the claim with other similar transactions and previous claims in order to identify patterns. For example, with a holistic view of each claimant, the technology flags up whether the claimant has been involved in other claims. In the case of car claims, they could reappear as a witness, a driver or a repair shop owner in different instances, alerting the insurer to potential fraud.
Insurers can then also take into account geographical considerations, such as the distance between the claimant’s home and the incident location; or the recurring involvement of local health professionals in whiplash claims.
To date, investigators have had to focus their limited efforts on large claims and recurring payments, such as for work injury. Yet technology can now take out so much of the guesswork involved in deciding which claims to investigate that it can make it cost-effective to investigate those many smaller claims, such as whiplash, that have often had to be ignored due to the cost and complexity of gathering that information.
Insurance firms have to strike the right balance between using technology and human expertise. For example, current rates of fraud detection using solely human expertise are at best 10 per cent, and are often far lower. This means there is a lot of wasted time, effort and cost.
By using an operational data hub, insurers can analyse data, assign a risk score to each claim and alert the right people in real-time. By delaying payment to settle all high-scoring suspicious claims, insurers will be in a position to turn the 10 per cent detection rate on its head.