Regular expressions provide a declarative language to match patterns within strings. They are commonly used for string validation, parsing, and transformation. Specify the language (PHP, Python, etc) or tool (grep, VS Code, Google Analytics, etc) that you are using. Do not post questions asking for an explanation of what a symbol means or what a particular regular expression will match.
IMPORTANT NOTE: Requests to explain a regular expression pattern or construct will be closed as duplicates of the canonical post What does this regex mean which contains a lot of details on regular expression constructs. The post also contains links to many popular online regular expression testers (where the meanings of regex constructs can be found). One such tool is Regex101.
Regular expressions are a powerful formalism for pattern matching in strings. They are available in a variety of dialects (also known as flavors) in a number of programming languages and text-processing tools, as well as many specialized applications. The term "Regular expression" is typically abbreviated as "RegEx" or "regex".
Before asking a question here, please take the time to review the following brief guidelines.
How To Ask
Specify what tool or language you are using
Regexes are everywhere. Different languages like Python, PHP and Java all use regexes, but with minor differences. Many different tools use regexes as well, from
grep
to most text editors to Google Analytics, also with their own differences. Specify the tool or language in your question. (Perhaps see also Why are there so many different regular expression dialects?)Be clear about what you need.
Keep in mind that regex dialects are different; the lowest common denominator will usually be quite different from what is possible and recommended for a tool with a modern, souped-up regex engine. (See previous section.)
Also, are you looking for a regular expression for input validation (which needs to be rather strict), or do you need one for information extraction (which can be somewhat relaxed)?
If your question relates to regular expressions in the strict computer science/automata theory sense, please state this explicitly.
For most other questions, you should always include sample input, expected output, and an outline of what you have tried, and where you are stuck. Often, an example of what you do not want to match is also very helpful, and important to know.
Show us what you tried.
A link to one of the many online regex testing tools (see link section) with your attempt and some representative data can do wonders.
However, keep in mind, again, that there are many different regular expression dialects. (See earlier bullet points.) A result from an online tool for JavaScript or PHP does not necessarily work in Python or Java or
sed
or Awk or ... what have you.Even if you cannot post your problem online, showing us your best attempt helps us focus on what you need help with.
Search for duplicates.
Before posting, check if your issue has already been solved by somebody else asking something similar. See also the following section.
Avoid Common Problems and Pitfalls
There are some common recurring beginner topics.
Do not assume that the tool you are using supports precisely the syntax of another tool.
While modern Perl/Ruby/Python/PHP/Java regular expression support is widespread, you cannot assume that it is universal. In particular, many older tools (Awk,
sed
,grep
,lex
, etc.), as well as some newer ones (JavaScript, many text editors), use different dialects, some of which do not necessarily support e.g. non-capturing parentheses(?:...)
, non-greedy quantifiers*?
, backreferences (\1
,\2
, etc), common character class abbreviations (\t
,\d
, POSIX character classes[[:class:]]
), arbitrary repetition{m,n}
, lookaheads(?=...)
,(?<=...)
,(?!...)
, etc. etc.If your question is not specific to any particular implementation, try the language-agnostic tag. This will generally imply a fairly minimal set of operators, corresponding to the ones specified in the common mathematical definition of regular languages.
Understand the difference between "glob" expressions and true regular expressions.
Glob patterns are a less potent pattern matching language, which is commonly used for file name wildcards. In glob,
*
means "anything", while a lone*
in a regular expression is, in fact, a syntax error in some dialects (though many engines will silently ignore it, rather than issue a warning; and others still will see it as a literal*
).For the record, the regex way to say (as much as possible of) "anything" is
.*
where the "any single character (except newline, usually)".
metacharacter is repeated zero or more times (*
). But see below about how "any character" and greediness is sometimes problematic.See also What are the differences between glob-style patterns and regular expressions?
Specifying a single repetition is unnecessary.
Using
{1}
as a single-repetition quantifier is harmless but never useful. It is basically an indication of inexperience and/or confusion.h{1}t{1}t{1}p{1}
matches the same string as the simpler expressionhttp
(orht{2}p
for that matter) but as you can see, the redundant{1}
repetitions only make it harder to read.Square brackets are commonly misunderstood or misused.
Beginners often attempt to use square brackets for everything, including grouping. While
[Jun][Jul]
may look like a regex for matching months, it actually matchesJJ
,Ju
,Jl
,uJ
,uu
,ul
,nJ
,nu
, ornl
; notJun
orJul
.[Jun|Jul]
is a wasteful way to write the functionally identical[|Junl]
—it matches any one character from the set comprising|
,J
,u
,l
, andn
.For the record,
[abc]
defines a character class which matches a single character which can bea
orb
orc
. The proper way to express alternation is(Jun|Jul|Aug)
in many dialects (though BRE and related dialects will need backslashes;\(Jun\|Jul\|Aug\)
for traditionalgrep
et al.) or, somewhat more parsimoniously,(Ju[nl]|Aug)
. The round parentheses (as opposed to the square brackets of character classes) perform grouping, and the|
operator indicates matching alternatives.See also What is the difference between square brackets and parentheses in a regex?
Negation is tricky.
Related to the previous, beginners will use negated character classes to attempt to restrict what can be matched. For example, to match
turn
but notturned
, the following does not do what you want:turn[^ed]
-- it will matchturn
followed by any single character which is note
ord
(so it will not matchturner
, for example).In fact, the traditional regex does not allow for this to be expressed easily. With ERE, you could say
turn($|[^e]|e$|e[^d])
to say thatturn
can be followed by nothing, or a character which is note
, or bye
if it is not in turn followed byd
. Modern regular expression dialects have an extension called lookarounds which allow you to sayturn(?!ed)
—but make sure your tool supports this syntax before plunging ahead.Notice also how the character class negation operator is distinct from the beginning of line anchor (
^[abc]
matchesa
,b
, orc
at beginning of the line, whereas[^abc]
matches a single character which is nota
,b
, orc
).See also the next bullet point.
If there is a way to match, the engine will find it.
A common beginner's mistake is to supply useless optional leading or trailing elements. The trailing
s?
indogs?
does nothing to prevent a match ondoggone
orendogenous
. If you want to prevent those, you will need to elaborate—perhaps something likedogs?\>
(provided your dialect supports the final word boundary operator and provided that's what you mean).As it is, the regular expression
dogs?
will match exactly the same strings as justdog
(though if your application captures the match, only the former will capture a trailings
if there is one).Matches are greedy.
The regex
a.*b
will match the entire string "abbbbbb" because*
will always match as much as possible. Saya[^ab]*b
if that's what you mean, or use non-greedy matching if your dialect supports it.Watch what you capture
If you use grouping parentheses, the parentheses define what is captured into a backreference. If you edit in parentheses for grouping purposes, make sure you are not renumbering your backreferences.
Also, in particular, watch out for
(abc){2,3}
which only captures the last occurrence ofabc
in the matched string. If you want the repetition to be part of the capture, it needs to be inside the parentheses, like this:((abc){2,3})
Don't use regex for everything!
In particular, using (typically line-oriented) traditional regex tools to handle structured formats like HTML, XML, JSON, configuration files with block structure (Apache, nginx, many name servers, etc.) is likely to fail, or to produce incorrect results in numerous corner cases.
Asking for HTML regexes tends to be met with negative reactions. The reasoning extends to all structured formats. If there is a parser for it, use that instead.
Further Reading
Learning regular expressions
- Basic concept of how RegularExpression parsing works
- Wikipedia entry on regular expressions
- Regular-Expressions.info (informative website for learning regular expressions)
- RexEgg (a regular expressions tutorial that goes deep into advanced features)
- RegexOne ("learn regular expressions with simple, interactive examples")
- Learn Regex The Hard Way (Online book, new version is in planning phase)
- From realpython.com:
- Regular Expressions: Regexes in Python (Part 1) - Long Read
- Regular Expressions: Regexes in Python (Part 2) - Long Read
- Regular Expressions and Building Regexes in Python - Video Course
- DataCamp: Python Regular Expression Tutorial - Long Read
Books
Documentation for JavaScript
Online sandboxes (for testing and publishing regexes online)
- RegexPlanet (supports a variety of flavors to choose from)
- Regexpal (ECMAScript flavor, as implemented by JavaScript)
- Regexhero (.NET flavor)
- RegexStorm.net (.NET flavor with link sharing capability)
- RegExr v2.1 (in JavaScript)
- RegExr v1.0 (ECMAScript flavor, as implemented by Adobe Flash)
- Rubular (Ruby flavor)
- myregexp.com (Java-applet with source code)
- regexe.com (German; probably Java flavor)
- regex101 (in ECMAScript (JavaScript), Python, PHP (PCRE 16-bit), Golang, Java, generates explanation of pattern)
- regexper.com (generates graphical representation for ECMAScript flavor)
- debuggex (generates graphical representation and shows processing of pattern – JavaScript, Python, and PCRE-compatible)
- pyregex.com (Web validator for Python regular expressions)
- regviz.org (Visual debugging of regular expressions for JavaScript)
- Ultrapico Expresso (a standalone tool for testing .NET regular expressions)
- Pythex (Quick way to test your Python regular expressions)
Online Regex generator (for building Regular Expressions via simplified input)
- Regex Numeric Range Generator (enter a min and max and receive the Regex for it)
Other links
- pcre – Perl Compatible Regular Expressions (PCRE) is a commonly used open-source C library inspired by Perl's regular expressions.
- Regular Expression Library – a searchable library of pre-made regular expressions. If you need to write a commonly-needed regex, searching for this library might be more efficient than asking a question here.
- Awesome Regex – curated collection of libraries, tools, frameworks and software
- VerbalExpressions – construct regular expressions with natural language terms
- What does the regex mean? SO post - an FAQ on numerous common regex understanding issues.
- Considering using the regex tag on a lua question? Please read this discussion on Meta first.
Regex Uses:
Regular expressions are useful in a wide variety of text processing tasks, and more generally string processing, where the data need not be textual. Common applications include data validation, data scraping (especially web scraping), data wrangling, simple parsing, the production of syntax highlighting systems, and many other tasks.
While regular expressions would be useful on Internet search engines, processing them across the entire database could consume excessive computer resources depending on the complexity and design of the regex. Although in many cases system administrators can run regex-based queries internally, most search engines do not offer regex support to the public. Notable exceptions: searchcode, or previously Google Code Search, which has been shut down in 2012.
Google also offers re2 (a C++ a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python): it does not backtrack and guarantees linear runtime growth with input size.