← Writings

Credit Card Fraud: What the Data Reveals

An analytical exploration of fraud detection — why accuracy alone is misleading, and what the data actually shows about the tradeoffs banks face.

1. The Problem

Credit card fraud is rare. When it happens, it is costly. Banks must decide, in milliseconds, whether to approve or block each purchase.

The real tension is not just catching fraud. It is the cost of blocking a legitimate customer — who may abandon the brand or switch cards — versus the cost of missing fraud. Blocking good customers can sometimes hurt more than missing a single fraudulent charge.

2. The Data Reality

The dataset contains 284,807 transactions over two days. Only 492 are fraud — 0.17%. That is one fraudulent transaction for every 579 legitimate ones.

Rare events are hard to learn from. Most of what the system sees is “normal.” The challenge is recognizing the tiny fraction that is not — without treating every edge case as suspicious.

0.17% of transactions are fraud — a needle in a haystack.

Takeaway: Fraud is a needle in a haystack. Any system that claims “99% accuracy” is mostly reporting how often it correctly classifies legitimate transactions — not how well it catches fraud.

3. First Exploration: Amounts

A natural question: Are fraudulent transactions the biggest purchases? If fraud skewed toward high amounts, we might prioritize blocking large transactions.

Fraud is spread across amount ranges — it does not stand out by size alone.

Takeaway: Fraud is spread across amount ranges. It hides in normal-sized purchases — small, medium, and large. Amount alone is a weak signal.

4. Time Behavior

Does fraud happen at random times? Or does it cluster?

Fraud clusters in time — testing sequences and burst activity.

Takeaway: Fraud tends to occur in bursts and testing sequences — fraudsters probing cards in quick succession. Time is a stronger signal than amount.

5. Behavioral Patterns

Do fraudulent transactions look similar in feature space? A 2D projection of transaction features reveals whether fraud clusters or blends in.

Fraud forms a distinct cluster in feature space — detectable, but overlapping with edge cases.

Takeaway: Fraud forms a distinct cluster — detectable with the right features. But there is overlap with legitimate edge cases. Perfect separation is not possible.

6. The Core Insight

The challenge is not predicting fraud. It is choosing when to act. Every decision involves two kinds of error: false positives (blocking a real customer) and missed fraud (letting a fraudulent charge through).

A system tuned to catch nearly all fraud will block many legitimate transactions. A system tuned to avoid blocking good customers will miss more fraud. The right balance depends on the relative costs — and those are business decisions, not data science outputs.

7. The Tradeoff: Sensitivity vs. Precision

Below, a simplified model shows how changing the decision threshold affects outcomes. Move the slider to see how many fraud cases you catch — and how many legitimate transactions you flag.

Move the slider: higher sensitivity catches more fraud but flags more legitimate transactions.

Fraud caught

308

of 492 total

Legitimate flagged

597

false positives

Takeaway: No threshold is “correct.” The decision depends on the cost of blocking a real customer vs. the cost of missing fraud.

8. What This Means

Banks struggle with fraud systems because the problem is inherently uncertain. Perfect detection is impossible. The best systems minimize the combined cost of false positives and missed fraud — but that cost function is defined by strategy, not by the data alone.

Metrics like “99% accuracy” are misleading when the positive class is 0.17% of the data. A model that predicts “no fraud” every time would still be correct 99.83% of the time — and useless.

9. Limitations

This analysis is based on anonymized PCA features — the original transaction details (merchant, location, device) are hidden. In practice, banks use richer signals: user behavior, device fingerprinting, and merchant reputation. The dataset also lacks context like cardholder history or recent activity.

These limitations matter. Acknowledging them is part of responsible data interpretation.

10. Reflection

This project reinforced a simple idea: data tells a story, but the story depends on how you frame the question. Metrics alone do not answer “what should we do?” — they inform a decision that balances competing goals. The job of a data practitioner is to make that tradeoff visible, not to hide it behind a single number.

← Back to Writings