Machine learning: Is it your best bet for online fraud prevention?

Noam Naveh
May 18, 2016
5 min read

The fraud prevention industry is crowded with vendors offering machine learning solutions to online fraud. But is it right for your business?

Machine learning is an umbrella term for the process of developing and using statistical models to make predictions based on past data. As an example, this may entail training a model on a large number of past transactions, each labeled “fraudulent” or “legit”, then asking the model to predict whether a new transaction is fraudulent or legit. The term “machine learning” is so sexy that it has become synonymous with a cure-all solution to fraud (and many other problems).

I strongly believe in applying machine learning to online transaction fraud. For large merchants or payment providers with enough data and resources, there is no better alternative. But it’s not for everyone. I recommend that you understand the prerequisites so you can make an informed decision. Here are questions to consider:

1. How well do you understand your fraud challenges?

It’s very tempting to think that machine learning can solve problems that are not well understood. The notion that machines can look at a bunch of data, detect hidden patterns and find elusive fraud indicators is based on truth but has been inflated by marketing to the level of hype. In practice, for deploying an accurate and stable fraud prevention model, you cannot escape the need to understand what the fraudsters are doing, correctly analyze the mistakes that the model makes, detect errors or missing data and identify ever-evolving fraud techniques. All these depend on humans who are well versed in the fraud problem. Machine learning does not replace the need for fraud expertise.

2. Do you have enough data?

It’s hard to give figures here due to the wide variety of modeling techniques, different requirements for the accuracy of the model, and, naturally, different applications (think: account takeover on a mobile banking app vs. detecting a fraudulent seller in an online marketplace). However, statistical models make sense when there are, at the very least, hundreds of examples of any phenomenon that you’d like the model to learn. Not bad, right? Well, if only 1% of your transactions are fraudulent, this means a total population of transactions in the tens of thousands. And that’s just for the training phase - it doesn’t end there. You need transactions for testing the model and enough new transactions to frequently refresh the model.

Moreover, if your fraudulent transactions come in several flavors - as they often do - you need enough examples of each flavor that you want the model to detect. If the flavors represent very different types of fraud, you may need more than one model, and each one needs its own data. Learning machines are indeed data-hungry.

3. How accurate is your data?

Even when you have many transactions it’s hard to know for sure which of them are fraudulent and which are not. The indicators typically used to determine fraudulent from legit are mired with inaccuracy. Chargebacks, supposedly the “bottom line” of fraudulent transactions, are not reliable. Errors include mistakes on the part of buyers and customer service issues. There is also “friendly” fraud, which requires special handling and should not be mixed with “identity” fraud. What about those transactions you declined because you were sure they were fraudulent and didn’t want to take the risk? What if some of them were actually legit?

The end result is that a small number of legit transactions are now presented to the model as “fraudulent” (or vice versa) during the training process. Imagine how confused your child would be if you punished them - only occasionally - for tidying up their room! That’s how confusing it is for the model, which tries to generalize fraud indicators from a set of transactions that contains many fraudulent ones and some legit ones. Modelers actually have a solution for all of this because, as statisticians, they expect errors. But the solution is… drum roll… more data. The effects of errors are diminished in bigger data sets. Nom nom nom says the data monster.

4. Do you have the right data?

Let’s assume you have enough transactions, and talk about the data points that you have for each transaction. When you have all the right data, a well trained fraud analyst can look at a transaction that just came in and tell, with a high degree of precision, whether it’s fraudulent or not. If she often gets it wrong, it is a clear sign that she doesn’t have the data she needs. Machine learning doesn’t solve that. You need to be collecting enough data from the transaction itself and often from external data sources to be able to tell quickly and accurately the “goods” from the “bads”. Also, if your fraud analysts regularly need to study Facebook profiles or make phone calls, they rely on data that cannot be consumed by a machine. In all these cases, your top priority is to go get more data and only then deal with decision-making technology. Machine learning can overcome human biases and inefficiencies when making decisions, but it needs the right data.

5. Do you understand the process?

Let’s imagine that an accurate and stable model magically lands in your lap and spews out scores for each of your transactions. These scores represent the probability of the transaction to be fraudulent. Now your organization needs to figure out how to integrate this score into the payment flow and all its supporting processes. Examples:

The customer service agent needs to know what to tell a customer who calls to complain about a declined transaction. “Um... the model says your transaction’s score was 65” will not cut it.
Your analysts need to be able to work with the modelers to translate chargebacks and false positives (legit transactions that are declined) into a better model.
The product managers, modelers and engineers should have a process for keeping the model fresh as fraud evolves, or when you launch a new product. Fraud models that aren’t constantly refreshed, fixed and improved get inaccurate quickly, and then start missing fraud and insulting your good customers.

Even when the modeling expertise is provided by the machine learning vendor, it will still take quite an effort to integrate these processes into the daily life of your business.

Summary

Machine learning is a fantastic field that increasingly pushes the envelope of what computers can do for business. The application of machine learning to fraud prevention is a must in some cases, but not in others. It requires understanding and preparation. Like any fraud prevention approach, it is very far from a fire-and-forget solution. Ultimately, the decision about this technology needs to be made in the context of your particular business needs, your data and your fraud prevention strategy.

Going to CNP next week? Let’s talk about fraud prevention for your business, or come hear me speak at a panel on mobile fraud.