The Complete Guide to Regular Expressions

This page is extra context on top of the full article under the tool. Nothing here replaces that copy — it adds history, engine internals, cross-language differences, and production angles.


A brief history of regex

Ken Thompson built regular expressions into the ed text editor in 1968. From that came grepglobally search for a regular expression and print. sed, awk, and lex followed, each extending the syntax slightly.

Perl (1987) changed everything. Larry Wall embedded regex so deeply into the language that patterns became a first-class construct. Perl added lookahead, lookbehind, non-greedy quantifiers, named captures, and dozens of other features that people now assume "regex" includes.

PCRE (Perl-Compatible Regular Expressions) extracted Perl's regex engine into a C library so other languages could use it — PHP, Nginx, and many others adopted it directly.

JavaScript got RegExp from the start (1995), but with a minimal feature set. It's been catching up: lookbehind assertions arrived in ES2018, the d (match indices) flag in ES2022, and the v (Unicode sets) flag in ES2024.


How regex engines work

There are two fundamental approaches:

NFA (nondeterministic finite automaton) — used by JavaScript, Python, Perl, Java, C#, and most other languages. The engine tries one path through the pattern, and if it fails, it backtracks — backs up and tries a different branch. This is what enables features like backreferences and lookaround, but it's also what causes catastrophic backtracking.

DFA (deterministic finite automaton) — used by Go's RE2, awk, and grep. The engine processes each input character exactly once. Guaranteed linear time, no backtracking. The tradeoff: no backreferences, no lookahead or lookbehind.

When you test a regex in this tool, you're using JavaScript's NFA engine.


JavaScript regex vs other engines

JavaScript's RegExp is not PCRE. If you're porting patterns from PHP, Perl, or a PCRE-based tool, expect breakage.

Feature JavaScript PCRE / Perl Python re Go RE2
Named groups (?<name>...) (?<name>...) (?P<name>...) (?P<name>...)
Lookbehind ✅ (ES2018+)
Atomic groups N/A (no backtracking)
Possessive quantifiers N/A
Recursion ❌ (use regex pkg)
Conditional patterns
Unicode properties (\p{...}) ✅ (with u/v flag) ❌ (use regex pkg)

If your pattern uses atomic groups or possessive quantifiers (common in PCRE for performance), you'll need to restructure it for JavaScript.


Catastrophic backtracking explained

Pattern: (a+)+b tested against "aaaaaaaaac".

The inner a+ matches some a's. The outer + repeats. When the engine hits c instead of b, it backtracks and tries every possible way to partition the a's between the inner and outer quantifier. For n a's, that's 2ⁿ combinations. Ten a's: 1,024 attempts. Twenty a's: over a million. Thirty a's: your tab freezes.

Real-world example: ^([\w.]+)+@ for email validation. The nested + on overlapping character classes is the same trap.

How to avoid it:

Since this tool runs in your browser, a catastrophic pattern freezes your tab — not a server. Reload if it hangs.


Regex in production

Input validation — but don't over-validate. For emails, check for @ and send a confirmation. The "RFC 5322 compliant" regex is 6,000+ characters and still doesn't handle everything.

Log parsing — named capture groups make extraction readable:

/(?<ip>\d+\.\d+\.\d+\.\d+) - - \[(?<date>[^\]]+)\] "(?<method>\w+) (?<path>\S+)/

IDE find-and-replace — VS Code's regex mode is one of the fastest ways to refactor across files.

URL routing — frameworks like Express use regex under the hood for path parameters.

Security: ReDoS — Regular Expression Denial of Service exploits catastrophic backtracking in server-side regex. If you accept user-supplied patterns, sandbox them or use RE2.


Common regex recipes

# Email (good enough — don't overthink it)
^\S+@\S+\.\S+$

# Semantic version
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-([\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?$

# IPv4 address
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$

# ISO date (YYYY-MM-DD)
^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$

Know when regex is not the right tool. HTML parsing, nested structures, context-free grammars — these need a real parser. If you're writing a regex longer than a line, step back and ask if a parser or a simple string split would be clearer.


Troubleshooting

Pattern works in Python but not here — Different engines. Python uses (?P<name>...) for named groups; JavaScript uses (?<name>...). PCRE features like \K, atomic groups, and possessive quantifiers aren't available in JavaScript's RegExp.

Regex hangs the browser — Catastrophic backtracking from nested quantifiers. Simplify the pattern — avoid (a+)+ or (.*a)* constructs. Reload the tab if it's frozen.

. doesn't match newlines — By default, . matches everything except \n. Enable the s (dotAll) flag. Or use [\s\S] as a portable alternative that works in every environment.

\b doesn't work with Unicode — JavaScript's \b only recognizes ASCII word characters. For Unicode word boundaries, enable the u or v flag and use more explicit patterns around the characters you're targeting.

Matches show but capture groups are empty — You're using non-capturing groups (?:...) instead of capturing groups (...). Remove the ?: to capture.

Pattern matches too much — Greedy quantifiers (*, +) grab as much as possible. Use lazy variants (*?, +?) or more specific character classes instead of ..