Most SEOs use Google Search Console regex for a handful of simple contains filters and leave it there. That's understandable — the regex field in GSC is small and the documentation is thin. But RE2 gives you considerably more power than that, and learning to use it changes how you read organic data.
This post assumes you're already using GSC and have seen the regex option in the filter bar. It's for people who want to go further — segmenting queries by intent, identifying patterns across URL structures, and building filters that actually reflect how your site is organised.
RE2: what you can and cannot do
Google Search Console uses RE2, Google's own regex engine. It's fast and safe, but it deliberately excludes some features you might know from PCRE or JavaScript regex:
- No lookaheads or lookbehinds —
(?=...)and(?<=...)don't work - No backreferences —
\1,\2etc. aren't supported - No atomic groups
- Anchors (
^and$), character classes, alternation (|), and quantifiers all work normally
Beyond contains: building intent-based query filters
The most useful thing regex gives you in GSC is the ability to define what a query segment actually means, rather than just filtering by keyword string.
Informational vs transactional
Informational queries typically contain certain trigger words. You can filter for these across the entire query dataset:
how|what|why|when|where|who|guide|tutorial|vs|versus|difference
This gives you a clean view of your informational traffic. Combine it with the Does not match regex version to isolate your transactional and commercial segments. The split is never perfect, but it's a significant improvement over eyeballing page by page.
Question queries
^(how|what|why|when|where|who|is|are|can|does|do|should)\s
The anchor (^) and trailing space (\s) tighten this considerably — you're catching queries that start with these words rather than just containing them anywhere. Useful for identifying featured snippet and PAA opportunities.
Long-tail filtering by word count
GSC doesn't have a word count field, but regex can approximate it. To find queries with at least four words:
\w.+\s\w.+\s\w.+\s\w.+
Not elegant, but effective. Each \w.+\s unit roughly matches a word followed by a space. Adjust the number of units for different minimum lengths.
Filtering by URL structure
Query filters get most of the attention, but page filters are where regex genuinely earns its place on large sites.
| Pattern | What it matches |
|---|---|
/blog/ | All blog URLs (contains match) |
^https://example\.com/blog/[^/]+/$ | Only top-level blog posts, not subcategories |
/category/|/tag/|/author/ | All taxonomy pages |
\? | Any URL with query parameters |
/en/|/en-gb/|/en-us/ | English-language sections (international sites) |
/product/[^/]+/$ | Product detail pages (one level deep) |
For international sites especially, being able to isolate a single language segment in GSC and filter queries within it is dramatically more useful than looking at global data. You can see which markets are driving which query types, where content gaps exist, and where cannibalism is occurring between regional variants.
The negative match: your most underused filter
The Does not match regex option is where the real segmentation power lives. It lets you build exclusion filters that clean up your data in ways the basic interface simply can't do.
Some practical examples:
- Exclude brand queries:
your brand|yourbrand|brand variant— gives you clean non-brand performance - Exclude navigational queries:
login|sign in|account|dashboard|contact - Exclude known low-value query types:
\d{4,}(queries containing 4+ digit numbers — often order IDs, tracking codes, or low-intent lookups)
Combining filters
GSC allows multiple filters simultaneously. A filter combination I use regularly on e-commerce sites:
- Page matches regex:
/product/ - Query does not match regex:
brand name|product SKU patterns - Country: target market
This gives you non-brand, non-navigational organic performance for your product range in a specific market. The data that's left is the meaningful stuff — actual discovery traffic, comparison queries, purchase-intent terms — and it's considerably smaller and more actionable than the raw aggregate.
Regex in GSC isn't a workaround. At the scale most enterprise sites operate at, it's the only practical way to make the data legible.
← Back to Notes