Coding

Regex Patterns That Actually Work in the Real World

Every developer has a regex graveyard — patterns that looked right, passed a few manual tests, then silently failed on inputs nobody thought to try. This article is a collection of battle-tested patterns for the tasks that come up constantly: email addresses, URLs, phone numbers, dates, colours, passwords. For each one, you'll see the pattern, the flags, what it matches, and crucially — what it doesn't.

Email Addresses

Email validation is the most cargo-culted regex task in existence. There are patterns that are technically RFC 5322 compliant and span multiple pages. Nobody uses them in production. Here's a pattern that covers 99.9% of real email addresses without being a nightmare to read:

Email Address
/[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}/i
i
Matches the local part before @, the domain name, and a TLD of 2+ characters. The i flag makes it case-insensitive, which matters more than you'd think for TLDs like .COM pasted from corporate email footers.
user@example.com user.name+tag@company.co.uk info@my-domain.io user@ @domain.com user@domain

For full-string validation (checking if an entire input is an email, not just finding emails within text), wrap it in anchors:

// JavaScript validation
const emailRe = /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/i;
console.log(emailRe.test('user@example.com')); // true
console.log(emailRe.test('not-an-email'));      // false
⚠️

Regex doesn't confirm a mailbox exists. It only checks format. Someone can enter valid@format.com and it'll pass — but the mailbox may not exist. For user registration, always send a confirmation email instead of relying on regex alone.

URLs

URL matching is tricky because valid URLs contain a huge variety of characters in the path, query string, and fragment. The pattern below catches http and https URLs reliably in a body of text:

HTTP/HTTPS URL
/https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&\/=]*)/gi
gi
Matches http and https URLs including paths, query strings, hashes, and port numbers. The g flag finds all URLs in a block of text.
https://thetoolempire.com http://sub.domain.co.uk/path?q=1&a=2 https://example.com/page#section ftp://files.example.com //protocol-relative.com
💡

If you're working in a browser, the built-in URL constructor is often a better choice than regex for parsing individual URLs. It handles edge cases correctly and throws on invalid input: try { new URL(str) } catch { /* invalid */ }.

Phone Numbers

Phone numbers are notoriously format-diverse. Unless you're dealing exclusively with a known locale, the best approach is a lenient pattern that accepts common separators rather than trying to enumerate every possible format:

US Phone Number
/\+?1?[\s.\-]?\(?\d{3}\)?[\s.\-]?\d{3}[\s.\-]?\d{4}/g
g
Handles the most common US formats: with and without country code, parentheses, dashes, dots, and spaces as separators.
555-123-4567 (555) 123-4567 +1 555.123.4567 5551234567 123-45-6789

For international phone validation, consider a library like libphonenumber-js instead of regex. The international phone number format space is too large to regex reliably.

Dates

ISO 8601 dates (YYYY-MM-DD) are structured enough for a precise regex with capture groups for each component:

ISO 8601 Date (YYYY-MM-DD)
/\b(\d{4})-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])\b/g
g
Month constrained to 01–12. Day constrained to 01–31. Year is any 4 digits — you'll want range validation in code for things like "before today" rules. Capture groups: $1 = year, $2 = month, $3 = day.
2026-06-06 1999-12-31 2026-13-01 2026-06-32 26-06-06
// Reformat ISO to DD/MM/YYYY with replace
const iso = '2026-06-06';
const formatted = iso.replace(/(\d{4})-(\d{2})-(\d{2})/, '$3/$2/$1');
// → "06/06/2026"

Hex Colours

CSS hex colours come in 3-digit and 6-digit forms. Both are easy to match in a single pattern:

CSS Hex Color
/#([a-fA-F0-9]{6}|[a-fA-F0-9]{3})\b/gi
gi
Matches 3-digit shorthand and 6-digit full hex colours. The capture group contains the hex digits without the #. The \b word boundary prevents matching in the middle of a longer hex string.
#ff6b6b #fff #A0B2C3 #gg0000 #12345
ℹ️

This pattern does not match 8-digit hex colours with alpha (#RRGGBBAA). Add a third alternation for that: #([a-fA-F0-9]{8}|[a-fA-F0-9]{6}|[a-fA-F0-9]{3})\b.

Password Strength

Password validation with regex uses lookaheads to enforce multiple independent conditions in a single pattern:

Strong Password
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/
Requires: at least one lowercase letter, at least one uppercase letter, at least one digit, at least one special character from !@#$%^&*, and minimum 8 characters total. Each (?=...) is a lookahead — it checks for the condition without consuming characters, so all four must be satisfied anywhere in the string.
Passw0rd! Tr0ub4dor&3 password1 PASSWORD! Short1!
⚠️

Regex-based password rules are a UX problem. NIST SP 800-63B guidance (2025) recommends allowing any printable character, checking against known-breached lists (like HaveIBeenPwned), and setting a high minimum length (12+) over complex rules. The pattern above is common but not necessarily good policy.

IPv4 Addresses

IPv4 Address
/\b((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)\b/g
g
Each octet validated to 0–255. The 25[0-5] handles 250–255, 2[0-4]\d handles 200–249, [01]?\d\d? handles 0–199. More precise than \d{1,3} which would match 999.
192.168.1.1 10.0.0.255 0.0.0.0 256.1.1.1 192.168.1

Markdown Links

Extracting or transforming Markdown links is a common task in documentation tools, link checkers, and static site generators:

Markdown Link
/\[([^\]]+)\]\(([^)]+)\)/g
g
Two capture groups: $1 = link text (everything inside []), $2 = URL (everything inside ()). Uses negated character classes to stop at the bracket boundaries rather than greedy matching.
[Click here](https://example.com) [The Tool Empire](../index.html) ![Image alt](image.png) — image syntax
// Extract all links from Markdown text
const md = 'Visit [Google](https://google.com) or [Bing](https://bing.com).';
const linkRe = /\[([^\]]+)\]\(([^)]+)\)/g;
let m;
while ((m = linkRe.exec(md)) !== null) {
  console.log(`Text: ${m[1]}, URL: ${m[2]}`);
}
// Text: Google, URL: https://google.com
// Text: Bing, URL: https://bing.com

// Convert Markdown links to HTML anchor tags
const html = md.replace(linkRe, '<a href="$2">$1</a>');

Duplicate Words

This pattern is a great illustration of back-references — matching something defined by what was already captured:

Duplicate Consecutive Words
/\b(\w+)\s+\1\b/gi
gi
(\w+) captures a word into group 1. \1 is a back-reference that matches the exact same text again. The i flag makes "The the" match as well as "the the".
the the is IS word word the other

Common Mistakes to Avoid

Matching too much with greedy quantifiers

Against <div>content</div><div>more</div>, the pattern <div>.*</div> (greedy) captures the entire string in one match because .* expands as far as possible. Switch to .*? (lazy) or [^<]* (character exclusion) to stop at the right boundary.

Missing anchors on full-string validation

Without ^ and $, the email pattern /[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}/ would match definitely-not@valid-email-buried.in.a.string.of.garbage. Add anchors when validating user input, not when scanning a document.

Forgetting the global flag

Without g, string.replace() only replaces the first match. string.match() without g returns the first match with group captures as an array instead of an array of all matches. Know which behaviour you need before you write the pattern.

Using regex for HTML parsing

Regex cannot parse HTML reliably. HTML is not a regular language — nesting, optional closing tags, CDATA sections, and attribute edge cases all break regex-based parsers. Use a real DOM parser (DOMParser in browser, jsdom or cheerio in Node.js) for anything beyond simple pattern-matching on small, known inputs.

JavaScript Regex API Quick Reference

MethodOnReturnsNotes
re.test(str)RegExpbooleanFastest check — does a match exist?
str.match(re)StringArray or nullWithout g: first match + groups. With g: all full matches, no groups.
str.matchAll(re)StringIteratorAll matches including groups. Requires g flag.
re.exec(str)RegExpArray or nullUse in a loop with g for matches + groups.
str.replace(re, rep)StringStringrep can be a string ($1 refs) or a function.
str.replaceAll(re, rep)StringStringRequires g flag.
str.search(re)Stringindex or -1Returns index of first match.
str.split(re)StringArraySplits on regex delimiter.
// matchAll — best way to get all matches with groups
const dateRe = /(\d{4})-(\d{2})-(\d{2})/g;
const text = 'Start: 2026-06-06, end: 2026-12-31.';
for (const m of text.matchAll(dateRe)) {
  console.log(`Full: ${m[0]}, Y=${m[1]}, M=${m[2]}, D=${m[3]}, at index ${m.index}`);
}

// replace with a function
const result = text.replace(dateRe, (full, y, m, d) => `${d}/${m}/${y}`);
// → "Start: 06/06/2026, end: 31/12/2026."

Python and PHP Equivalents

All the patterns above work in Python and PHP with minor adjustments:

import re

# Python — same patterns, slightly different API
email_re = re.compile(r'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}', re.I)
matches = email_re.findall('Contact us at hello@example.com or info@domain.co.uk')
# → ['hello@example.com', 'info@domain.co.uk']

# Named groups in Python
date_re = re.compile(r'\b(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})\b')
m = date_re.search('Today is 2026-06-06')
if m:
    print(m.group('year'), m.group('month'), m.group('day'))
    # → 2026 06 06
<?php
// PHP — preg_* functions use PCRE
$email = '/[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}/i';
preg_match_all($email, 'hello@example.com and info@domain.co.uk', $matches);
// $matches[0] → ['hello@example.com', 'info@domain.co.uk']

// Replace dates
$result = preg_replace('/(\d{4})-(\d{2})-(\d{2})/', '$3/$2/$1', '2026-06-06');
// → '06/06/2026'
?>
Test these patterns instantly
Paste any of the patterns from this article into our Regex Tester. You'll get live match highlighting, capture group values, replace mode, and the full cheatsheet — all in your browser.
Open Regex Tester →

When to Use Regex (and When Not To)

Regex is the right tool when you need to find, extract, or transform patterns in text — especially patterns with structure you can describe precisely. It's fast, expressive, and available in every language.

It's the wrong tool when the input format is too complex to describe as a regular language: HTML, XML, JSON, CSS, programming language source code, natural language text. Use parsers for those. The meme is true: parsing HTML with regex summons unspeakable horrors. Use DOMParser.

The patterns in this article cover the sweet spot — structured strings that are regular enough for regex to be both readable and reliable. Copy them, test them, and adjust the flags to fit your context.

⚙️
The Tool Empire Team
We build free browser-based tools and write about the practical techniques behind them. No signup required, nothing sent to a server.

Related Tools

Related Articles