Regex Patterns That Actually Work in the Real World
Every developer has a regex graveyard — patterns that looked right, passed a few manual tests, then silently failed on inputs nobody thought to try. This article is a collection of battle-tested patterns for the tasks that come up constantly: email addresses, URLs, phone numbers, dates, colours, passwords. For each one, you'll see the pattern, the flags, what it matches, and crucially — what it doesn't.
Email Addresses
Email validation is the most cargo-culted regex task in existence. There are patterns that are technically RFC 5322 compliant and span multiple pages. Nobody uses them in production. Here's a pattern that covers 99.9% of real email addresses without being a nightmare to read:
i flag makes it case-insensitive, which matters more than you'd think for TLDs like .COM pasted from corporate email footers.For full-string validation (checking if an entire input is an email, not just finding emails within text), wrap it in anchors:
// JavaScript validation
const emailRe = /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/i;
console.log(emailRe.test('user@example.com')); // true
console.log(emailRe.test('not-an-email')); // false
Regex doesn't confirm a mailbox exists. It only checks format. Someone can enter valid@format.com and it'll pass — but the mailbox may not exist. For user registration, always send a confirmation email instead of relying on regex alone.
URLs
URL matching is tricky because valid URLs contain a huge variety of characters in the path, query string, and fragment. The pattern below catches http and https URLs reliably in a body of text:
g flag finds all URLs in a block of text.If you're working in a browser, the built-in URL constructor is often a better choice than regex for parsing individual URLs. It handles edge cases correctly and throws on invalid input: try { new URL(str) } catch { /* invalid */ }.
Phone Numbers
Phone numbers are notoriously format-diverse. Unless you're dealing exclusively with a known locale, the best approach is a lenient pattern that accepts common separators rather than trying to enumerate every possible format:
For international phone validation, consider a library like libphonenumber-js instead of regex. The international phone number format space is too large to regex reliably.
Dates
ISO 8601 dates (YYYY-MM-DD) are structured enough for a precise regex with capture groups for each component:
$1 = year, $2 = month, $3 = day.// Reformat ISO to DD/MM/YYYY with replace
const iso = '2026-06-06';
const formatted = iso.replace(/(\d{4})-(\d{2})-(\d{2})/, '$3/$2/$1');
// → "06/06/2026"
Hex Colours
CSS hex colours come in 3-digit and 6-digit forms. Both are easy to match in a single pattern:
#. The \b word boundary prevents matching in the middle of a longer hex string.This pattern does not match 8-digit hex colours with alpha (#RRGGBBAA). Add a third alternation for that: #([a-fA-F0-9]{8}|[a-fA-F0-9]{6}|[a-fA-F0-9]{3})\b.
Password Strength
Password validation with regex uses lookaheads to enforce multiple independent conditions in a single pattern:
!@#$%^&*, and minimum 8 characters total. Each (?=...) is a lookahead — it checks for the condition without consuming characters, so all four must be satisfied anywhere in the string.Regex-based password rules are a UX problem. NIST SP 800-63B guidance (2025) recommends allowing any printable character, checking against known-breached lists (like HaveIBeenPwned), and setting a high minimum length (12+) over complex rules. The pattern above is common but not necessarily good policy.
IPv4 Addresses
25[0-5] handles 250–255, 2[0-4]\d handles 200–249, [01]?\d\d? handles 0–199. More precise than \d{1,3} which would match 999.Markdown Links
Extracting or transforming Markdown links is a common task in documentation tools, link checkers, and static site generators:
$1 = link text (everything inside []), $2 = URL (everything inside ()). Uses negated character classes to stop at the bracket boundaries rather than greedy matching.// Extract all links from Markdown text
const md = 'Visit [Google](https://google.com) or [Bing](https://bing.com).';
const linkRe = /\[([^\]]+)\]\(([^)]+)\)/g;
let m;
while ((m = linkRe.exec(md)) !== null) {
console.log(`Text: ${m[1]}, URL: ${m[2]}`);
}
// Text: Google, URL: https://google.com
// Text: Bing, URL: https://bing.com
// Convert Markdown links to HTML anchor tags
const html = md.replace(linkRe, '<a href="$2">$1</a>');
Duplicate Words
This pattern is a great illustration of back-references — matching something defined by what was already captured:
(\w+) captures a word into group 1. \1 is a back-reference that matches the exact same text again. The i flag makes "The the" match as well as "the the".Common Mistakes to Avoid
Matching too much with greedy quantifiers
Against <div>content</div><div>more</div>, the pattern <div>.*</div> (greedy) captures the entire string in one match because .* expands as far as possible. Switch to .*? (lazy) or [^<]* (character exclusion) to stop at the right boundary.
Missing anchors on full-string validation
Without ^ and $, the email pattern /[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}/ would match definitely-not@valid-email-buried.in.a.string.of.garbage. Add anchors when validating user input, not when scanning a document.
Forgetting the global flag
Without g, string.replace() only replaces the first match. string.match() without g returns the first match with group captures as an array instead of an array of all matches. Know which behaviour you need before you write the pattern.
Using regex for HTML parsing
Regex cannot parse HTML reliably. HTML is not a regular language — nesting, optional closing tags, CDATA sections, and attribute edge cases all break regex-based parsers. Use a real DOM parser (DOMParser in browser, jsdom or cheerio in Node.js) for anything beyond simple pattern-matching on small, known inputs.
JavaScript Regex API Quick Reference
| Method | On | Returns | Notes |
|---|---|---|---|
re.test(str) | RegExp | boolean | Fastest check — does a match exist? |
str.match(re) | String | Array or null | Without g: first match + groups. With g: all full matches, no groups. |
str.matchAll(re) | String | Iterator | All matches including groups. Requires g flag. |
re.exec(str) | RegExp | Array or null | Use in a loop with g for matches + groups. |
str.replace(re, rep) | String | String | rep can be a string ($1 refs) or a function. |
str.replaceAll(re, rep) | String | String | Requires g flag. |
str.search(re) | String | index or -1 | Returns index of first match. |
str.split(re) | String | Array | Splits on regex delimiter. |
// matchAll — best way to get all matches with groups
const dateRe = /(\d{4})-(\d{2})-(\d{2})/g;
const text = 'Start: 2026-06-06, end: 2026-12-31.';
for (const m of text.matchAll(dateRe)) {
console.log(`Full: ${m[0]}, Y=${m[1]}, M=${m[2]}, D=${m[3]}, at index ${m.index}`);
}
// replace with a function
const result = text.replace(dateRe, (full, y, m, d) => `${d}/${m}/${y}`);
// → "Start: 06/06/2026, end: 31/12/2026."
Python and PHP Equivalents
All the patterns above work in Python and PHP with minor adjustments:
import re
# Python — same patterns, slightly different API
email_re = re.compile(r'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}', re.I)
matches = email_re.findall('Contact us at hello@example.com or info@domain.co.uk')
# → ['hello@example.com', 'info@domain.co.uk']
# Named groups in Python
date_re = re.compile(r'\b(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})\b')
m = date_re.search('Today is 2026-06-06')
if m:
print(m.group('year'), m.group('month'), m.group('day'))
# → 2026 06 06
<?php
// PHP — preg_* functions use PCRE
$email = '/[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}/i';
preg_match_all($email, 'hello@example.com and info@domain.co.uk', $matches);
// $matches[0] → ['hello@example.com', 'info@domain.co.uk']
// Replace dates
$result = preg_replace('/(\d{4})-(\d{2})-(\d{2})/', '$3/$2/$1', '2026-06-06');
// → '06/06/2026'
?>
When to Use Regex (and When Not To)
Regex is the right tool when you need to find, extract, or transform patterns in text — especially patterns with structure you can describe precisely. It's fast, expressive, and available in every language.
It's the wrong tool when the input format is too complex to describe as a regular language: HTML, XML, JSON, CSS, programming language source code, natural language text. Use parsers for those. The meme is true: parsing HTML with regex summons unspeakable horrors. Use DOMParser.
The patterns in this article cover the sweet spot — structured strings that are regular enough for regex to be both readable and reliable. Copy them, test them, and adjust the flags to fit your context.