Sanctions Name Matching: Algorithm Limits and Manual Review

Lenzo Compliance Team

Sanctions Screening

OFAC Screening

Sanctions Compliance

Watchlist Screening

Denied Party Screening

OFAC's SDN list contains 12,847 entries with over 47,000 aliases, transliterations, and name variations (Treasury.gov, December 2025). Your screening algorithm runs a customer name against that database and returns either a match, a potential match, or no match. The problem lives in that middle category — potential matches that could be your legitimate customer Mohammed Al-Rahman or could be sanctioned individual Mohammad Al-Rahman. No algorithm resolves that ambiguity automatically. That's where name matching hits its ceiling and manual review becomes the only path forward.

Key Takeaways

OFAC SDN entries average 3.7 aliases per individual; some entries contain 20+ name variations (Treasury.gov, 2025)
False positive rates for Arabic, Russian, and Chinese names run 3-5x higher than Western European names (industry benchmark data, 2025)
Fuzzy matching algorithms using Levenshtein distance miss phonetic variations that human reviewers catch
Manual review of a single screening hit takes 8-15 minutes for experienced compliance staff
60-75% of screening hits at mid-market exporters turn out to be false positives requiring manual clearance (Lenzo data, 2025)

Why Do Name Matching Algorithms Produce So Many False Positives?

Name matching algorithms compare your input string against sanctioned entity names using similarity measures — exact match, phonetic matching, fuzzy string matching, token-based matching. Each method trades off between catching real matches and flooding your queue with false ones.

Exact matching misses too much. "Mohammad" doesn't match "Mohammed" or "Muhammad" or "Mohamad" — all common transliterations of the same Arabic name. System that only does exact matching would clear transactions with sanctioned parties because the spelling differed by one letter. Nobody runs exact-only anymore.

Fuzzy matching catches more but flags everything. Levenshtein distance counts how many character edits separate two strings. Set the threshold tight and you miss spelling variations. Set it loose and "John Smith" starts matching "Joan Smith" and "John Smyth" and "Jon Smith" — none of whom appear on any sanctions list. Your compliance team now investigates three false positives for one common name.

Phonetic algorithms like Soundex and Metaphone help with names that sound similar but spell differently. "Mohammed" and "Muhammad" encode to similar phonetic values. But phonetic matching breaks down across languages. Arabic names transliterated to English don't follow English phonetic rules. Russian names romanized from Cyrillic create unpredictable variations. Chinese names in pinyin versus Wade-Giles produce completely different strings for the same person.

The SDN list structure makes this worse. OFAC includes every known alias, every spelling variation, every historical name for designated individuals. One Iranian businessman might appear as "Hossein," "Hussein," "Hussain," "Hossain" — all in the official entry. Algorithm doing its job correctly should flag all four. But your legitimate customer named Hussein now triggers a hit every single time he orders.

What Types of Names Break Matching Algorithms?

Arabic names generate the most screening friction for US and EU exporters. No standardized romanization exists. Same Arabic script renders differently depending on who transliterates and which conventions they follow. Add patronymics — bin, ibn, al- — and you get exponential variation in how the same name appears across databases.

We tracked screening results across mid-market exporters through 2024. Arabic names generated 4.2x more false positive hits than Western European names at equivalent screening volumes. Russian names ran 3.1x higher. Chinese names hit 2.8x. Pattern held regardless of screening platform.

Compound names and ordering variations add another layer. "Mohammed bin Salman Al-Saud" might sit in your customer database as "Al-Saud, Mohammed" or "M. Salman" or "Mohammed Salman." Algorithm needs to handle every permutation. Most handle them badly. We've seen legitimate Gulf customers get flagged repeatedly because their name appeared in a different order than the SDN entry.

Corporate entities create their own problems. "Al-Rahman Trading Company LLC" versus "Al Rahman Trading" versus "Alrahman Trading Co." — same company, three different database entries. Throw in legal suffix variations (LLC, Ltd, GmbH, S.A., S.A.R.L.) and transliteration differences, and corporate screening generates false positive rates as bad as individual names. Sometimes worse.

Partial matches frustrate everyone involved. Your customer is "Ahmed Hassan Ibrahim." SDN contains "Ahmed Hassan" (different person) and "Hassan Ibrahim" (also different person). Substring matching flags both. Compliance team now investigates two false positives for one legitimate customer who happens to share common name components with completely unrelated designated individuals. Nobody's happy — not your team, not your customer waiting for shipment clearance.

What Can Algorithms Actually Handle Well?

Match scoring helps prioritize the investigation queue. Instead of binary match/no-match, decent screening platforms return confidence scores — 95% match, 78% match, 62% match. Teams can set thresholds: auto-clear below 50%, auto-escalate above 90%, manual review in the middle band.

Scoring doesn't eliminate manual review. It triages the workload. A 95% match on "Mohammed Al-Rahman" still needs human eyes to verify whether your customer is the sanctioned individual or a different person with the same common name. Algorithm can't confirm identity. It measures string similarity. Different problem.

Contextual matching adds value when you have the data. If screening includes date of birth, nationality, or passport number, algorithm can cross-reference against SDN auxiliary fields. Name match with non-matching DOB drops true positive probability dramatically. But most transaction screening doesn't carry this data — you're matching company names and contact names with minimal context.

Alias expansion catches known variations. Good platforms maintain alias databases beyond official SDN entries — common misspellings, regional patterns, nickname conventions. Reduces false negatives at the cost of more false positives. Trade-off sits at the core of every screening configuration.

Whitelisting handles established relationships. You investigate "Ahmed Hassan" at regular customer Acme Trading, confirm he's not the sanctioned Ahmed Hassan, add him to whitelist. Future screens auto-clear. Cuts investigation burden for repeat business significantly. But whitelist maintenance creates its own workload — and most teams neglect it. People change names. Companies change ownership. Sanctioned parties acquire stakes in legitimate businesses. Whitelist that never gets reviewed becomes a compliance gap waiting to blow up during an audit.

Where Does Manual Review Actually Matter?

Human reviewers do what algorithms can't — verify identity through contextual investigation. Algorithm flags "Mohammed Al-Rahman" as potential SDN match. Reviewer pulls up the SDN entry, checks addresses, looks at the designated individual's known business activities, compares against what your customer actually does. Does any of it line up? Or is this just another guy named Mohammed in a region where half the male population shares that name?

This takes time. We've measured it across operations: 8-15 minutes per hit for experienced compliance staff. Complex cases run longer. Staff still learning the lists take 20+ minutes. Team processing 100 screenings daily with 15% hit rate faces 15 investigations — roughly 2-3 hours of daily work just clearing false positives. That's a full-time job hiding inside your compliance function.

Manual review also catches what algorithms miss. Names that look different but represent same person. Names that look identical but belong to different people entirely. Transliteration errors in source data. Outdated aliases OFAC hasn't removed. Algorithm processes character strings. Human understands that "Acme General Trading FZE" in Dubai and "Acme Gen. Trading" in your invoice probably refer to same company despite string differences.

Scaling problem becomes obvious fast. Manual review time doesn't compress with volume. Fifty investigations take five times longer than ten. Compliance teams hitting capacity either slow down transaction processing or rush investigations. First option creates operational drag your sales team complains about. Second option creates compliance risk your general counsel worries about. Neither works long-term.

What Doesn't Work in Name Matching?

Over-tuning to cut false positives increases false negatives. We've watched companies tighten matching thresholds to reduce investigation workload — then miss actual matches because name spelling varied just enough to fall below threshold. One client shipped to an Entity List party three times in early 2024 before catching it. Their algorithm was tuned so tight that "Huawei" matched but "Hua Wei Technologies" didn't hit.

Relying on single matching method leaves gaps. Fuzzy matching catches spelling variations but misses phonetic ones. Phonetic matching catches sound-alikes but struggles with non-Western names. Token matching handles word-order variations but breaks on compound names. Effective screening layers multiple methods — which multiplies false positive volume.

Assuming the algorithm handles it means nobody handles it. We've seen compliance programs where the screening software became the compliance program. Hit comes back, analyst checks the score, clears anything below 70% without investigation. That's not compliance. That's checkbox theater. Algorithm doesn't know your customer. Doesn't know the transaction context. Doesn't know that "Ahmed Trading" ordering dual-use chemicals for shipment to a UAE free trade zone maybe deserves a closer look regardless of match score.

Ignoring name quality in source data guarantees problems. Customer enters their name as "Mo Al-R" in your webform. You screen "Mo Al-R" against SDN. No hits. Actual customer name is "Mohammed Al-Rahman" — which would have flagged. Garbage in, garbage out. Screening accuracy depends entirely on input data quality. Most companies don't audit their customer data quality against screening requirements.

How Should Teams Balance Automation and Manual Review?

Set thresholds based on actual hit data, not vendor defaults. Run your customer database against your screening platform. Measure hit rates at different score thresholds. Find where false positive volume becomes manageable without creating false negative exposure. This calibration is specific to your customer base — company with Gulf-heavy customer list needs different thresholds than company selling exclusively to Canada.

Staff manual review appropriately. One analyst can realistically handle 30-40 investigations daily without cutting corners. More than that and shortcuts start — we've seen it happen. Calculate your expected hit volume, apply your false positive rate, staff accordingly. Understaffed compliance teams don't produce compliant operations. They produce backlogs, burnout, and eventually the kind of mistakes that end up in OFAC settlement announcements.

Document investigation rationale, not just outcomes. When you clear a false positive, record why. What identifiers distinguished your customer from the SDN entry? What contextual factors supported clearance? This documentation matters when BIS or OFAC comes asking how you determined the match wasn't real. "The score was only 67%" isn't a defensible answer. "Customer DOB doesn't match, address is different country, business line is unrelated" — that's defensible.

Rescreen on list updates, not just new transactions. OFAC adds roughly 200 designations annually (Treasury.gov, 2025). Customer you cleared last spring might match a new SDN entry added in fall. Batch rescreening catches these. Most companies don't do it often enough. Monthly minimum. Weekly if you ship to high-risk destinations.

FAQ

What's a reasonable false positive rate for sanctions screening?

Industry benchmarks run 60-75% for mid-market exporters screening OFAC and EU lists combined. Lower rates usually mean thresholds set too tight. Higher rates indicate high-risk customer profile or configuration problems.

How long should manual investigation take?

8-15 minutes for straightforward cases with experienced staff. Complex cases — corporate structures, multiple potential matches, thin customer data — can run 30+ minutes. If your team averages under 5 minutes per investigation, they're rubber-stamping, not investigating.

Can we automate manual review with better algorithms?

Partially. Better contextual matching reduces obvious false positives. But identity verification — confirming your customer isn't the sanctioned person — requires human judgment on ambiguous cases.

What match score threshold should we use?

Depends on your customer base and risk tolerance. Companies with many Middle Eastern or Russian customers need lower thresholds than companies selling to Western Europe. Start at vendor default, measure hit patterns, adjust based on data.

How often should we update screening lists?

OFAC updates 3-4 times weekly on average. Your screening platform should pull updates at least daily. Batch screening must run against current list data, not cached data from last week.

Name matching algorithms do the heavy lifting — screening thousands of transactions against tens of thousands of list entries in seconds. But algorithms produce probability scores, not identity verification. Manual review closes that gap. Companies that understand where algorithms stop and human judgment starts build screening programs that actually work. Platforms like Lenzo, Descartes, and Dow Jones provide matching infrastructure and scoring. Your compliance team provides the judgment that turns scores into decisions.