The Complete Guide to Regular Expressions

Q: Pattern works in Python but not here

Different engines. Python uses `(?P ...)` for named groups; JavaScript uses `(? ...)`. PCRE features like `\K`, atomic groups, and possessive quantifiers aren't available in JavaScript's RegExp.

Q: `.` doesn't match newlines

By default, `.` matches everything except `\n`. Enable the `s` (dotAll) flag. Or use `[\s\S]` as a portable alternative that works in every environment.

Q: `\b` doesn't work with Unicode

JavaScript's `\b` only recognizes ASCII word characters. For Unicode word boundaries, enable the `u` or `v` flag and use more explicit patterns around the characters you're targeting.

Q: Matches show but capture groups are empty

You're using non-capturing groups `(?:...)` instead of capturing groups `(...)`. Remove the `?:` to capture.

Q: Pattern matches too much

Greedy quantifiers (`*`, `+`) grab as much as possible. Use lazy variants (`*?`, `+?`) or more specific character classes instead of `.`.

This page is extra context on top of the full article under the tool. Nothing here replaces that copy — it adds history, engine internals, cross-language differences, and production angles.

A brief history of regex

Ken Thompson built regular expressions into the ed text editor in 1968. From that came grep — globally search for a regular expression and print. sed, awk, and lex followed, each extending the syntax slightly.

Perl (1987) changed everything. Larry Wall embedded regex so deeply into the language that patterns became a first-class construct. Perl added lookahead, lookbehind, non-greedy quantifiers, named captures, and dozens of other features that people now assume "regex" includes.

PCRE (Perl-Compatible Regular Expressions) extracted Perl's regex engine into a C library so other languages could use it — PHP, Nginx, and many others adopted it directly.

JavaScript got RegExp from the start (1995), but with a minimal feature set. It's been catching up: lookbehind assertions arrived in ES2018, the d (match indices) flag in ES2022, and the v (Unicode sets) flag in ES2024.

How regex engines work

There are two fundamental approaches:

NFA (nondeterministic finite automaton) — used by JavaScript, Python, Perl, Java, C#, and most other languages. The engine tries one path through the pattern, and if it fails, it backtracks — backs up and tries a different branch. This is what enables features like backreferences and lookaround, but it's also what causes catastrophic backtracking.

DFA (deterministic finite automaton) — used by Go's RE2, awk, and grep. The engine processes each input character exactly once. Guaranteed linear time, no backtracking. The tradeoff: no backreferences, no lookahead or lookbehind.

When you test a regex in this tool, you're using JavaScript's NFA engine.

JavaScript regex vs other engines

JavaScript's RegExp is not PCRE. If you're porting patterns from PHP, Perl, or a PCRE-based tool, expect breakage.

Feature	JavaScript	PCRE / Perl	Python `re`	Go RE2
Named groups	`(?<name>...)`	`(?<name>...)`	`(?P<name>...)`	`(?P<name>...)`
Lookbehind	✅ (ES2018+)	✅	✅	❌
Atomic groups	❌	✅	❌	N/A (no backtracking)
Possessive quantifiers	❌	✅	❌	N/A
Recursion	❌	✅	❌ (use `regex` pkg)	❌
Conditional patterns	❌	✅	✅	❌
Unicode properties (`\p{...}`)	✅ (with `u`/`v` flag)	✅	❌ (use `regex` pkg)	✅

If your pattern uses atomic groups or possessive quantifiers (common in PCRE for performance), you'll need to restructure it for JavaScript.

Catastrophic backtracking explained

Pattern: (a+)+b tested against "aaaaaaaaac".

The inner a+ matches some a's. The outer + repeats. When the engine hits c instead of b, it backtracks and tries every possible way to partition the a's between the inner and outer quantifier. For n a's, that's 2ⁿ combinations. Ten a's: 1,024 attempts. Twenty a's: over a million. Thirty a's: your tab freezes.

Real-world example: ^([\w.]+)+@ for email validation. The nested + on overlapping character classes is the same trap.

How to avoid it:

Don't nest quantifiers on character classes that overlap ((a+)+, (\w+)+)
Use more specific character classes instead of .
Use + instead of * when you need at least one match
For server-side code handling untrusted patterns, consider RE2 (linear-time guarantee)

Since this tool runs in your browser, a catastrophic pattern freezes your tab — not a server. Reload if it hangs.

Regex in production

Input validation — but don't over-validate. For emails, check for @ and send a confirmation. The "RFC 5322 compliant" regex is 6,000+ characters and still doesn't handle everything.

Log parsing — named capture groups make extraction readable:

/(?<ip>\d+\.\d+\.\d+\.\d+) - - \[(?<date>[^\]]+)\] "(?<method>\w+) (?<path>\S+)/

IDE find-and-replace — VS Code's regex mode is one of the fastest ways to refactor across files.

URL routing — frameworks like Express use regex under the hood for path parameters.

Security: ReDoS — Regular Expression Denial of Service exploits catastrophic backtracking in server-side regex. If you accept user-supplied patterns, sandbox them or use RE2.

Common regex recipes

# Email (good enough — don't overthink it)
^\S+@\S+\.\S+$

# Semantic version
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-([\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?$

# IPv4 address
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

# ISO date (YYYY-MM-DD)
^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$

Know when regex is not the right tool. HTML parsing, nested structures, context-free grammars — these need a real parser. If you're writing a regex longer than a line, step back and ask if a parser or a simple string split would be clearer.

Troubleshooting

Pattern works in Python but not here — Different engines. Python uses (?P<name>...) for named groups; JavaScript uses (?<name>...). PCRE features like \K, atomic groups, and possessive quantifiers aren't available in JavaScript's RegExp.

Regex hangs the browser — Catastrophic backtracking from nested quantifiers. Simplify the pattern — avoid (a+)+ or (.*a)* constructs. Reload the tab if it's frozen.

. doesn't match newlines — By default, . matches everything except \n. Enable the s (dotAll) flag. Or use [\s\S] as a portable alternative that works in every environment.

\b doesn't work with Unicode — JavaScript's \b only recognizes ASCII word characters. For Unicode word boundaries, enable the u or v flag and use more explicit patterns around the characters you're targeting.

Matches show but capture groups are empty — You're using non-capturing groups (?:...) instead of capturing groups (...). Remove the ?: to capture.

Pattern matches too much — Greedy quantifiers (*, +) grab as much as possible. Use lazy variants (*?, +?) or more specific character classes instead of ..