The Complete Guide to Regular Expressions
This page is extra context on top of the full article under the tool. Nothing here replaces that copy — it adds history, engine internals, cross-language differences, and production angles.
A brief history of regex
Ken Thompson built regular expressions into the ed text editor in 1968. From that came grep — globally search for a regular expression and print. sed, awk, and lex followed, each extending the syntax slightly.
Perl (1987) changed everything. Larry Wall embedded regex so deeply into the language that patterns became a first-class construct. Perl added lookahead, lookbehind, non-greedy quantifiers, named captures, and dozens of other features that people now assume "regex" includes.
PCRE (Perl-Compatible Regular Expressions) extracted Perl's regex engine into a C library so other languages could use it — PHP, Nginx, and many others adopted it directly.
JavaScript got RegExp from the start (1995), but with a minimal feature set. It's been catching up: lookbehind assertions arrived in ES2018, the d (match indices) flag in ES2022, and the v (Unicode sets) flag in ES2024.
How regex engines work
There are two fundamental approaches:
NFA (nondeterministic finite automaton) — used by JavaScript, Python, Perl, Java, C#, and most other languages. The engine tries one path through the pattern, and if it fails, it backtracks — backs up and tries a different branch. This is what enables features like backreferences and lookaround, but it's also what causes catastrophic backtracking.
DFA (deterministic finite automaton) — used by Go's RE2, awk, and grep. The engine processes each input character exactly once. Guaranteed linear time, no backtracking. The tradeoff: no backreferences, no lookahead or lookbehind.
When you test a regex in this tool, you're using JavaScript's NFA engine.
JavaScript regex vs other engines
JavaScript's RegExp is not PCRE. If you're porting patterns from PHP, Perl, or a PCRE-based tool, expect breakage.
| Feature | JavaScript | PCRE / Perl | Python re |
Go RE2 |
|---|---|---|---|---|
| Named groups | (?<name>...) |
(?<name>...) |
(?P<name>...) |
(?P<name>...) |
| Lookbehind | ✅ (ES2018+) | ✅ | ✅ | ❌ |
| Atomic groups | ❌ | ✅ | ❌ | N/A (no backtracking) |
| Possessive quantifiers | ❌ | ✅ | ❌ | N/A |
| Recursion | ❌ | ✅ | ❌ (use regex pkg) |
❌ |
| Conditional patterns | ❌ | ✅ | ✅ | ❌ |
Unicode properties (\p{...}) |
✅ (with u/v flag) |
✅ | ❌ (use regex pkg) |
✅ |
If your pattern uses atomic groups or possessive quantifiers (common in PCRE for performance), you'll need to restructure it for JavaScript.
Catastrophic backtracking explained
Pattern: (a+)+b tested against "aaaaaaaaac".
The inner a+ matches some a's. The outer + repeats. When the engine hits c instead of b, it backtracks and tries every possible way to partition the a's between the inner and outer quantifier. For n a's, that's 2ⁿ combinations. Ten a's: 1,024 attempts. Twenty a's: over a million. Thirty a's: your tab freezes.
Real-world example: ^([\w.]+)+@ for email validation. The nested + on overlapping character classes is the same trap.
How to avoid it:
- Don't nest quantifiers on character classes that overlap (
(a+)+,(\w+)+) - Use more specific character classes instead of
. - Use
+instead of*when you need at least one match - For server-side code handling untrusted patterns, consider RE2 (linear-time guarantee)
Since this tool runs in your browser, a catastrophic pattern freezes your tab — not a server. Reload if it hangs.
Regex in production
Input validation — but don't over-validate. For emails, check for @ and send a confirmation. The "RFC 5322 compliant" regex is 6,000+ characters and still doesn't handle everything.
Log parsing — named capture groups make extraction readable:
/(?<ip>\d+\.\d+\.\d+\.\d+) - - \[(?<date>[^\]]+)\] "(?<method>\w+) (?<path>\S+)/
IDE find-and-replace — VS Code's regex mode is one of the fastest ways to refactor across files.
URL routing — frameworks like Express use regex under the hood for path parameters.
Security: ReDoS — Regular Expression Denial of Service exploits catastrophic backtracking in server-side regex. If you accept user-supplied patterns, sandbox them or use RE2.
Common regex recipes
# Email (good enough — don't overthink it)
^\S+@\S+\.\S+$
# Semantic version
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-([\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?$
# IPv4 address
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$
# ISO date (YYYY-MM-DD)
^\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])$
Know when regex is not the right tool. HTML parsing, nested structures, context-free grammars — these need a real parser. If you're writing a regex longer than a line, step back and ask if a parser or a simple string split would be clearer.
Troubleshooting
Pattern works in Python but not here — Different engines. Python uses (?P<name>...) for named groups; JavaScript uses (?<name>...). PCRE features like \K, atomic groups, and possessive quantifiers aren't available in JavaScript's RegExp.
Regex hangs the browser — Catastrophic backtracking from nested quantifiers. Simplify the pattern — avoid (a+)+ or (.*a)* constructs. Reload the tab if it's frozen.
. doesn't match newlines — By default, . matches everything except \n. Enable the s (dotAll) flag. Or use [\s\S] as a portable alternative that works in every environment.
\b doesn't work with Unicode — JavaScript's \b only recognizes ASCII word characters. For Unicode word boundaries, enable the u or v flag and use more explicit patterns around the characters you're targeting.
Matches show but capture groups are empty — You're using non-capturing groups (?:...) instead of capturing groups (...). Remove the ?: to capture.
Pattern matches too much — Greedy quantifiers (*, +) grab as much as possible. Use lazy variants (*?, +?) or more specific character classes instead of ..