Regular Expression Tester

Catastrophic Backtracking — How a Regex Took Down Cloudflare

On July 2, 2019 at 13:42 UTC, Cloudflare's global edge went 100% CPU on every server. HTTP traffic stopped flowing. The cause, per their public post-mortem, was a single new WAF rule deployed minutes earlier: (?:(?:\"|'|\]|\}|\\|\d|(?:nan|infinity|true|false|null|undefined|symbol|math)|\`|\-|\+)+[)]*;?((?:\s|-|~|!|{}|\|\||\+)*.*(?:.*=.*))) That single character class with the unbounded + at the end, applied to certain crafted strings, triggered exponential-time backtracking in PCRE. The regex engine entered a runaway state where it tried billions of permutations to "complete" a match. Every CPU on every Cloudflare server was occupied with that work, simultaneously, for 27 minutes. The post-mortem named the failure mode: catastrophic backtracking, also known as ReDoS (Regex Denial of Service). The mechanism is structural. Most production regex engines (PCRE, Java's java.util.regex, Python's re, Ruby's Onigmo, JavaScript's V8 Irregexp) implement matching with backtracking. When a match attempt fails late in the pattern, the engine rewinds and tries an alternative path. With nested quantifiers — patterns like (a+)+, (.*)*, or (\w*)*$ — each character can be partitioned into the inner and outer groups in exponentially many ways. A 30-character input can produce 2^30 ≈ 1 billion partition attempts before the engine gives up. The CPU spends seconds, minutes, or in some cases years on a single regex match. The patterns to watch for: (1) any quantifier wrapped around a quantifier — (a+)+ is the textbook case. (2) Alternation containing the same prefix — (a|aa)+. (3) Greedy quantifiers across an unbounded segment that ends with a literal that may not match — ^(a+)+\$ tested against "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab" makes most engines crawl. Three defenses exist. (1) Use an engine that is structurally immune. RE2 (Google), Hyperscan, Rust's regex crate, Go's regexp, and Lua patterns all use linear-time NFA simulation; they do not backtrack. The cost is that they refuse to support backreferences and lookbehind in some forms, but for 95% of real patterns that is fine. RE2 was literally designed because Google had ReDoS incidents. (2) Set a wall-clock timeout on every regex evaluation. Java has Pattern.matcher with no built-in timeout (you have to use thread interruption); .NET added Regex(pattern, options, timeout) explicitly for this reason. (3) Lint your patterns. Tools like vuln-regex-detector, safe-regex, and ESLint's no-unsafe-regex flag the worst structural shapes before they ship. A pragmatic rule: if you accept regex from end users, NEVER use the language's default engine without a timeout. If you write the pattern yourself, run it past one of the linters, especially before deploying to a request-path WAF or input validator. Cloudflare did neither in 2019.
// Catastrophic patterns
/^(a+)+$/.test('a'.repeat(30) + 'b');
// V8: ~30 seconds. Engine in tight backtracking loop.

/(.*)*=/.test('=' + 'a'.repeat(50));
// Even worse with inner unbounded greedy.

// Production safety in Node
const RE2 = require('re2');         // linear time
const safe = new RE2('(a+)+');
safe.test('a'.repeat(30) + 'b');    // returns instantly, false

// Native regex with watchdog (workers)
function timedRegex(pattern, input, ms = 50) {
  return new Promise((resolve, reject) => {
    const w = new Worker(`onmessage = e => {
      const re = new RegExp(e.data.p);
      postMessage(re.test(e.data.s));
    }`, { eval: true });
    const timer = setTimeout(() => { w.terminate(); reject(new Error('regex timeout')); }, ms);
    w.onmessage = e => { clearTimeout(timer); resolve(e.data); w.terminate(); };
    w.postMessage({ p: pattern, s: input });
  });
}

// .NET — built-in timeout
var re = new Regex(pattern, RegexOptions.None, TimeSpan.FromMilliseconds(50));

Regex Engine Comparison: PCRE vs RE2 vs JavaScript V8

Three engines dominate modern regex implementations, and they differ in important ways. PCRE (Perl Compatible Regular Expressions). The C library that powers PHP, nginx, Apache, Cloudflare's WAF historically, and many others. Backtracking-based, supports the maximum feature set: backreferences, lookbehinds (variable-length in PCRE2), recursion, atomic groups, possessive quantifiers, conditional patterns, named captures with all the syntaxes, callouts. The cost is the worst-case behavior — exponential time on adversarial input. PCRE2 (the modern fork) added a JIT compiler that makes the average case competitive with RE2 in raw matching speed but does not change the worst case. RE2 (Google). C++ implementation by Russ Cox, also bound to Go's standard library. Built on Thompson's NFA simulation algorithm: the engine maintains a set of possible parser states and advances them in lockstep through the input. Linear time in the input length, regardless of pattern complexity. The trade-off is what cannot be supported: arbitrary backreferences (the regex must not require remembering arbitrary previously-matched text), and lookbehind is restricted to bounded length in some bindings. For 95% of regex patterns in real codebases, RE2 is a drop-in replacement that eliminates ReDoS at the cost of a few unusual features. Google migrated all internal services to RE2 after a series of regex outages. JavaScript V8 (Irregexp). The browser's regex engine. Backtracking-based, but heavily optimized: it compiles regex patterns to native code at the JIT level. Supports lookbehinds (since 2018, ES2018), named captures (\(?<name>...\)), Unicode property escapes (\p{Letter}), the s flag (dotAll), and the d flag (match indices). It does NOT have a timeout primitive — runaway patterns crash a tab. Newer V8 versions (since 2022) have an experimental linear-time mode you can enable with the /l flag in v8 engine flags, but it is not on by default and not exposed in standard JS. The practical guide: server side, prefer RE2 (Go natively, re2 npm in Node, regex-rs in Rust, google-re2 in Python) for any pattern that touches user input. Use PCRE if you need its richer feature set and you trust the patterns. Browser side, accept that V8 has no built-in protection — keep patterns simple, avoid nested quantifiers, prefer non-backtracking constructs (atomic groups via /(?>x)/ are not in JS but you can simulate via /(?=(x))\1/, lookahead trick). Whenever possible, replace a complex regex with explicit string operations — string.includes, string.split, indexOf are all linear time and unboundedly fast. Feature mismatch matters. Code copied from a Stack Overflow PCRE answer often does not work in JavaScript: PCRE's variable-width lookbehinds, possessive quantifiers (a++), recursive subroutines ((?R)), and some Unicode classes are PCRE-only. JavaScript-only features include the d (hasIndices) flag and the v (Unicode-Sets) flag (ES2024). When porting between engines, run the pattern against your test cases — silent semantic differences are common.
// PCRE-only that fails in JS:
preg_match('/^(?P>group)$(?<group>foo|bar)/', $s);  // recursion
preg_match('/(?<=\w+)b/', $s);                     // var-width lookbehind

// JavaScript-only:
'abcabc'.matchAll(/(?<x>\w)/dg);  // d flag → result[i].indices
[...'abc'.matchAll(/[\p{Letter}--[a]]/v)]; // v flag set difference

// Cross-engine safe baseline (works everywhere):
const safePattern = /^[a-zA-Z][a-zA-Z0-9_]{0,31}$/;

// Server: use RE2 in Node
const RE2 = require('re2');
const safe = new RE2('^([a-z]+)\\s+(\d+)$');
const m = safe.match('alice 42');
m;  // ['alice 42', 'alice', '42']

// Replace regex with string ops when you can — guaranteed O(n)
const isEmail = (s) => s.includes('@') && s.indexOf('@') === s.lastIndexOf('@');

Lookahead and Lookbehind — When You Actually Need Them

Lookarounds are zero-width assertions: they check what comes before or after the current position without consuming characters. There are four flavors: positive lookahead (?=...), negative lookahead (?!...), positive lookbehind (?<=...), negative lookbehind (?<!...). They are powerful and frequently overused. Roughly 80% of the lookarounds you will see in regex tutorials could be written as plain regex with capture groups instead, and would be faster and easier to read. The legitimate uses fall into a few categories. (1) Tokenizing without consuming a delimiter. Splitting "abc123def" into letters and digits with a regex like /(?<=\D)(?=\d)|(?<=\d)(?=\D)/ inserts splits at the boundary between letter and digit without consuming any characters, so .split returns ["abc","123","def"]. The same effect with non-zero-width regex requires explicit reassembly. (2) Asserting a constraint outside the matched region. "Find a number that is followed by a unit but I do not want the unit in my match": /\d+(?=px|em|rem)/. Without lookahead you would either include the unit in the match (and post-process to strip it) or use a capture group and refer to group 1. Lookahead is cleaner. (3) Validating multiple independent constraints in one pattern. The classic "password must contain a digit, a letter, a special character, be 12+ chars" pattern: /^(?=.*\d)(?=.*[a-zA-Z])(?=.*[!@#$%])\S{12,}$/. Each (?=...) checks one rule against the whole string without advancing. This works and is concise, but it is also the canonical example where multiple separate validations would be clearer in application code. (4) Word boundaries inside Unicode. JavaScript's \b is ASCII-only by default. To assert "letter boundary" against Unicode text you need /(?<![\p{Letter}\p{Mark}])foo(?![\p{Letter}\p{Mark}])/u with the Unicode flag. When NOT to use lookarounds: any time a capture group would do. A pattern like /(\d+)px/ matched against "12px" gives match[1] = "12" without lookbehind. Replace /(?<=\$)\d+/.exec("$42")[0] with /\$(\d+)/.exec("$42")[1] — same data, simpler regex, runs in every engine including ones with restricted lookbehind. A subtle performance footgun: in backtracking engines, lookarounds inside quantified groups can multiply the worst-case work, because each backtrack step re-evaluates the assertion. RE2 handles lookahead but not arbitrary lookbehind. Java added bounded lookbehind in 9. Python supports fixed-length lookbehind only. ECMAScript supports both lookahead and unbounded lookbehind since 2018, but Safari shipped lookbehind support only in 2023, so if you need to support older Safari, lookbehind is off the table.
// Tokenize on letter/digit boundary (zero-width split)
'abc123def'.split(/(?<=\D)(?=\d)|(?<=\d)(?=\D)/);
// → ['abc', '123', 'def']

// Match number followed by CSS unit, exclude the unit
const px = '20px 1rem 3em'.match(/\d+(?=px|em|rem)/g);
// → ['20', '1', '3']

// Password rule with lookaheads
/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%^&*])\S{12,}$/.test(pw);

// Lookbehind for dollar amount
'price $42 and €50'.match(/(?<=\$)\d+/);  // ['42']

// SAME RESULT without lookbehind (more portable)
'price $42 and €50'.match(/\$(\d+)/)[1];   // '42'

// AVOID: nested-quantifier-with-lookahead — slow on adversarial input
/^(.+(?=.+))+$/.test('a'.repeat(20) + 'X');  // backtracks badly
Last updated:

About this tool

A regex tester evaluates a JavaScript regular expression against a sample string and shows every match, capture group, and index. Regex is the universal pattern-matching mini-language built into nearly every text editor, search tool, and programming language. A live tester turns the trial-and-error of building patterns into a fast feedback loop.

How to use

  1. Enter your regex pattern in the pattern field — without surrounding slashes.
  2. Add flags such as g (global), i (case-insensitive), m (multiline), s (dotall), or u (unicode).
  3. Paste the test string in the text area below.
  4. Read every match, its index, and any capture groups in the results panel.
  5. Use the common patterns shortcut to start from a working example like email, URL, or date.

Common use cases

  • Validating email or phone-number input on a form.
  • Extracting all URLs from a chunk of plain text.
  • Building a search-and-replace pattern for a large code refactor.
  • Parsing log lines into structured fields with capture groups.
  • Testing a router or middleware regex before shipping it.
  • Stripping or normalising whitespace, punctuation, or accent characters.

Frequently asked questions

Q. Why does my pattern match nothing?

A. Check anchors (^ and $) and the multiline flag, escape special characters (. needs to be \.), and confirm whether you really want greedy or lazy quantifiers.

Q. Are these patterns portable to other languages?

A. Mostly. JavaScript regex is close to PCRE; most patterns work in Python, Java, and Go with minor syntax differences (named groups, lookbehind support).

Q. What does the g flag actually do?

A. g (global) tells the engine to find every match instead of only the first. Without it, .match returns the first match plus its groups, not an array of matches.

Q. Can regex parse HTML?

A. Famously not reliably. Regex cannot match arbitrarily nested structures. Use a real HTML parser; reach for regex only for simple, line-bounded extraction.