6 essentials for fighting fraud with machine learning

November 19, 2019
Going far beyond traditional attack detection, sophisticated machine learning systems help organizations stay one step ahead of fraudsters.

We hear it all the time: Fraud prevention is hard because fraudsters continually change and adapt. The minute you figure out how to recognize and prevent one scam, a new one emerges to take its place.

Naturally, then, the best technology for fighting fraud is one that can change and adapt as quickly as the fraudster’s tactics. That’s what makes machine learning (ML) systems perfect for fighting fraud. When designed optimally, they learn, adapt, and uncover emerging patterns without the over-adaptation that can result in too many false positives.

Traditionally, organizations have relied on rules-based systems to detect fraud. Rules employ if-then logic that can be thorough at uncovering known patterns of fraud. And although rules remain an important fraud-fighting tool, especially in combination with advanced approaches, they are limited to recognizing patterns you already know and can program into the logic. They’re not effective at adapting to new fraud patterns, uncovering unknown schemes, or identifying increasingly sophisticated fraud techniques.

That’s why more and more industries are embracing ML, and artificial intelligence, for fraud detection. Recent research by SAS and the Association of Certified Fraud Examiners found that a mere 13% of organizations across industries take advantage of these technologies to detect and deter fraud. Another 25% plan to incorporate them into their anti-fraud programs over the next two years–a near 200% jump.

In fraud detection, supervised ML models attempt to learn from identified records in data.

Supervised or unsupervised learning for fraud detection

So, how does it work? Simply put, ML automates the extraction of known and unknown patterns from data. Once it recognizes those patterns, it can apply what it knows to new and unseen data. The machine learns and adapts as new outcomes and new patterns are presented to it via a feedback loop.

In fraud detection, supervised ML models attempt to learn from identified records in data, often referred to as labeled data. To train a supervised model, you present it both fraudulent and nonfraudulent records that have been labeled as such.

Unsupervised ML is different. When you don’t know what data is fraudulent, you ask the model to learn the data structure on its own. You simply present it with data, and the model attempts to understand the underlying structure and dimensions of that data.

Detecting fraud with ML: The components

To apply ML to fraud detection, at a minimum, you’ll need the following components:

  1. Data: As with all ML applications, quality data is foundational to building anti-fraud ML systems. Data sets are only growing larger, and as the volumes increase, so does the challenge of detecting fraud. Thankfully the adage that more data equals better models is true when it comes to fraud detection. The make-or-break factor is having a ML platform that can scale as data and complexity increase.
  2. Multiplicity: There’s no single ML algorithm or method that works best for fraud detection. Success comes from the ability to try lots of different methods, testing variations and evaluating them with an array of data sets. That requires a toolkit with a variety of supervised and unsupervised methods, as well as a range of feature engineering techniques. The application of ML in new and novel ways, like combining a variety of supervised and unsupervised methods in one system, is more effective than any single method alone.
  3. Integration: This seems an obvious must-have, but it remains a common roadblock to success in many organizations. Only 50% of all models developed ever make it into production, resulting in a lot of wasted effort. Once you have developed a ML model, the challenge becomes deploying it in an operational run-time environment. If your data is in Hadoop, it makes sense that your ML model can be applied in Hadoop. Similarly, if your data is streaming in real-time systems, you want a ML engine that can run in real time or in stream. Portability of the model and integration of the decision logic within operational systems is paramount to stopping fraud at scale – and as it occurs at scale.
  4. White-boxing: ML methods and models are generally black boxes. It is often very difficult (if not impossible) to explain to decision makers how the model came to the score or conclusion it did. But explaining the “what” and the “how” for ML systems is critical, particularly in highly regulated industries like financial services. This explainability factor is often referred to as “white-boxing” or interpretability, and it is critical for supporting model validation and governance processes.
  5. Ongoing monitoring: Ongoing monitoring of ML fraud detection systems is imperative for success. As populations and the underlying data shift, expect system inputs to degrade and affect overall performance. This isn’t unique to ML systems; rule-based systems have the same challenge. But newer ML methods can adapt to new and unidentified patterns as underlying changes occur. This eliminates some, but not all, of the ML retraining and evaluation steps. A good monitoring program registers and tracks the ongoing efficacy of all models.
  6. Experimentation: Successful ML programs have an element of ongoing experimentation. It isn’t enough to just build a ML model and let it crunch. Fraudsters are clever, and technology changes quickly. Having a sandbox where data scientists can freely experiment with a variety of methods, data and techniques to combat fraud has become a critical aspect of top anti-fraud programs. Investments in boosting the capacity of data scientists who fight fraud can yield almost immediate payback.

Balancing detection and customer experience

Identifying nefarious transactions while delivering quality customer service is a delicate balancing act. An organization that frequently declines legitimate transactions or makes its authentication measures too cumbersome is apt to lose customers. ML systems are ideal for minimizing this type of friction.

For example, one global financial institution recently worked with SAS to modernize its rule-based fraud detection system and help strike a balance between oversight and customer service. To do this, the bank implemented an ML-based solution from SAS that uses an ensemble of neural networks to create two different fraud scores:

  1. A primary fraud score, evaluating the likelihood that an account is in a fraudulent state.
  2. A transactional score, evaluating the likelihood that an individual transaction is fraudulent.

Using this dual-score approach, the financial institution correctly identified nearly $1 million in monthly transactions that had been erroneously identified as fraud. It was also able to find an additional $1.5 million per month in fraud that had previously gone undetected.

Bringing it all together

Fraud detection is a challenging problem. While fraudulent transactions represent a very small fraction of activity within an organization, a small percentage of activity can quickly turn into big dollar losses without the right tools and systems in place. With the advances in ML, systems can learn, adapt, and uncover emerging patterns for preventing fraud–so you can keep up with the fraudsters even as they evolve and change tactics.

This article was written by Stu Bradley and first appeared on technologyreview.com on 18 November, 2019.