I'm after some simple examples and best practices of how to use regular expressions in ANSI C. man regex.h
does not provide that much help.

- 6,880
- 16
- 81
- 127

- 19,515
- 28
- 127
- 217
-
7There is no built-in support for regex in ANSI C. What regex library are you using? – Joe Jul 06 '09 at 01:59
-
12[Rob Pike](http://en.wikipedia.org/wiki/Rob_Pike) wrote a small regular expression string search function that accepted a very useful subset of regular expressions for the book The Practice of Programming which he and [Brian Kernighan](http://en.wikipedia.org/wiki/Brian_Kernighan) co-authored. See this discussion, A Regular Expression Matcher, by Dr. Kernighan http://www.cs.princeton.edu/courses/archive/spr09/cos333/beautiful.html – Richard Chambers Dec 14 '14 at 23:18
5 Answers
Regular expressions actually aren't part of ANSI C. It sounds like you might be talking about the POSIX regular expression library, which comes with most (all?) *nixes. Here's an example of using POSIX regexes in C (based on this):
#include <regex.h>
regex_t regex;
int reti;
char msgbuf[100];
/* Compile regular expression */
reti = regcomp(®ex, "^a[[:alnum:]]", 0);
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
/* Execute regular expression */
reti = regexec(®ex, "abc", 0, NULL, 0);
if (!reti) {
puts("Match");
}
else if (reti == REG_NOMATCH) {
puts("No match");
}
else {
regerror(reti, ®ex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(1);
}
/* Free memory allocated to the pattern buffer by regcomp() */
regfree(®ex);
Alternatively, you may want to check out PCRE, a library for Perl-compatible regular expressions in C. The Perl syntax is pretty much that same syntax used in Java, Python, and a number of other languages. The POSIX syntax is the syntax used by grep
, sed
, vi
, etc.

- 9,542
- 3
- 22
- 40

- 137,896
- 35
- 246
- 299
-
7Unless you need to avoid the dependency I second PCRE, it has some nice syntax enhancements and is very stable. At least with some older versions of Linux, the "built in" regular expression library isn't too difficult to crash given certain input strings and certain regular expressions that "almost" match or involve a lot of special characters – bdk Jul 06 '09 at 02:16
-
@Laurence What's the meaning of passing 0 to regcomp? regcomp only takes four integer values 1, 2, 4 and 8 to represent 4 different modes. – lixiang Sep 21 '13 at 07:40
-
2@lixiang The last parameter to `regcomp`, `cflags`, is a bitmask. From http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html : "The cflags argument is the bitwise-inclusive OR of zero or more of the following flags...". If you OR-together zero, you'll get 0. I see that the Linux manpage for `regcomp` says "cflags may be the bitwise-or of one or more of the following", which does seem misleading. – Laurence Gonsalves Sep 22 '13 at 18:11
-
@LaurenceGonsalves Yes, the documentation is a little ambiguous; now I know that the number 1, 2, 4 and 8 simply mean 0001, 0020, 0100 and 1000. – lixiang Sep 22 '13 at 19:38
-
to put the code in a main block, as just copying in a file to compile will not work. – user2050516 Nov 05 '14 at 08:21
-
how can I make this "regex_t regex" a reallocable array if it's possible? – Shail_42 Nov 20 '14 at 07:18
-
@Shail_42 I'm not sure what you mean, but that sounds like it deserves a separate question. – Laurence Gonsalves Nov 20 '14 at 22:45
-
4You can extract text from matching groups with something like: `regmatch_t matches[MAX_MATCHES]; if (regexec(&exp, sz, MAX_MATCHES, matches, 0) == 0) { memcpy(buff, sz + matches[1].rm_so, matches[1].rm_eo - matches[1].rm_so); printf("group1: %s\n", buff); }` note that group matches start at 1, group 0 is the entire string. Add error checks for out of bounds, etc. – BurnsBA Feb 07 '16 at 07:42
-
"Free compiled regular expression if you want to use the `regex_t` again"; this sounds wrong. The documentation says the function frees several internal fields. In other words, you should call `regfree()` regardless of whether you want to reuse the `regex_t`. In fact, the way the documentation puts it, you also have to call it on failed `regcomp()`s... – Yd Ahhrk Feb 15 '16 at 18:14
-
@YdAhhrk I agree that the "if you want to use the regex_t again" bit made no sense. I've removed that from the comment. Which documentation are you looking at that says you have to call it even if `regcomp` fails? I'm not seeing this in any docs I can find, and every example I can find does *not* call `regfree` if `regcomp` fails. eg: http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html – Laurence Gonsalves Feb 23 '16 at 01:44
-
@LaurenceGonsalves `man regcomp` says "`regerror()` is used to turn the error codes **that can be returned by both `regcomp()`** and `regexec()` into error message strings." What's interesting is that `regerror` requires as argument the `regex_t` that you tried to initialize during `regcomp`. This doesn't mean the `regex_t` still has allocated memory to release, but it's suspicious anyway. – Yd Ahhrk Feb 23 '16 at 21:07
-
@LaurenceGonsalves I found that, as long as you have called `regcomp` once (regardless of success), you can call `regfree` as many consecutive times as you want and it won't complain or crash. It really isn't designed like your average variation of the `free` function. I feel like the mentality of this API is that All pipelines after a `regcomp` can (and should) include a `regfree` for symmetric "candy" purposes. Like it wasn't designed by someone used to the `malloc-free` pattern (where you don't have to deallocate unless the init was successful). – Yd Ahhrk Feb 23 '16 at 21:08
-
@LaurenceGonsalves In real life though, implementations of `regcomp` and friends probably also terminate gracefully during uses more akin to `malloc-free`. It's what I've been led to infer, anyway. But as a very careful user of these functions, I'd rather play by contracts the documentation and the API seem to promise. – Yd Ahhrk Feb 23 '16 at 21:08
-
@LaurenceGonsalves As for the Internet examples, they are always simplified to remove clutter, are they not? Competent users are expected to fill in the blanks. I mean, for your sample code to be guaranteed to be 100% memory leak-free, shouldn't you also include a call to `regfree` before the `exit(1)`? – Yd Ahhrk Feb 23 '16 at 21:08
-
2Regarding whether `regfree` is necessary after a failed `regcomp`, though it really is rather under-specified, this suggest that it shouldn't be done: https://www.redhat.com/archives/libvir-list/2013-September/msg00276.html – Daniel Jour Jun 17 '16 at 19:50
-
I'm disappointed that substitution is not natively supported in any standard library. I was hoping to learn C by creating some small but useful text-based tools (instead of with Awk or Bash): https://rosettacode.org/wiki/Regular_expressions#C . I guess the only way to do this is to use Flex + Bison which is really hardcore. – Sridhar Sarnobat Aug 24 '20 at 19:18
This is an example of using REG_EXTENDED. This regular expression
"^(-)?([0-9]+)((,|.)([0-9]+))?\n$"
Allows you to catch decimal numbers in Spanish system and international. :)
#include <regex.h>
#include <stdlib.h>
#include <stdio.h>
regex_t regex;
int reti;
char msgbuf[100];
int main(int argc, char const *argv[])
{
while(1){
fgets( msgbuf, 100, stdin );
reti = regcomp(®ex, "^(-)?([0-9]+)((,|.)([0-9]+))?\n$", REG_EXTENDED);
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
/* Execute regular expression */
printf("%s\n", msgbuf);
reti = regexec(®ex, msgbuf, 0, NULL, 0);
if (!reti) {
puts("Match");
}
else if (reti == REG_NOMATCH) {
puts("No match");
}
else {
regerror(reti, ®ex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(1);
}
/* Free memory allocated to the pattern buffer by regcomp() */
regfree(®ex);
}
}

- 151
- 1
- 4
-
2don't you think, it s better to keep `regcomp` outside the loop?. unless it should be initialized every time it get used. – milevyo Aug 06 '21 at 20:08
It's probably not what you want, but a tool like re2c can compile POSIX(-ish) regular expressions to ANSI C. It's written as a replacement for lex
, but this approach allows you to sacrifice flexibility and legibility for the last bit of speed, if you really need it.

- 5,783
- 1
- 21
- 32
man regex.h
doesn't show any manual entry for regex.h, but man 3 regex
shows a page explaining the POSIX functions for pattern matching.
The same functions are described in The GNU C Library: Regular Expression Matching, which explains that the GNU C Library supports both the POSIX.2 interface and the interface the GNU C Library has had for many years.
For example, for an hypothetical program that prints which of the strings passed as argument matches the pattern passed as first argument, you could use code similar to the following one.
#include <errno.h>
#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void print_regerror (int errcode, size_t length, regex_t *compiled);
int
main (int argc, char *argv[])
{
regex_t regex;
int result;
if (argc < 3)
{
// The number of passed arguments is lower than the number of
// expected arguments.
fputs ("Missing command line arguments\n", stderr);
return EXIT_FAILURE;
}
result = regcomp (®ex, argv[1], REG_EXTENDED);
if (result)
{
// Any value different from 0 means it was not possible to
// compile the regular expression, either for memory problems
// or problems with the regular expression syntax.
if (result == REG_ESPACE)
fprintf (stderr, "%s\n", strerror(ENOMEM));
else
fputs ("Syntax error in the regular expression passed as first argument\n", stderr);
return EXIT_FAILURE;
}
for (int i = 2; i < argc; i++)
{
result = regexec (®ex, argv[i], 0, NULL, 0);
if (!result)
{
printf ("'%s' matches the regular expression\n", argv[i]);
}
else if (result == REG_NOMATCH)
{
printf ("'%s' doesn't match the regular expression\n", argv[i]);
}
else
{
// The function returned an error; print the string
// describing it.
// Get the size of the buffer required for the error message.
size_t length = regerror (result, ®ex, NULL, 0);
print_regerror (result, length, ®ex);
return EXIT_FAILURE;
}
}
/* Free the memory allocated from regcomp(). */
regfree (®ex);
return EXIT_SUCCESS;
}
void
print_regerror (int errcode, size_t length, regex_t *compiled)
{
char buffer[length];
(void) regerror (errcode, compiled, buffer, length);
fprintf(stderr, "Regex match failed: %s\n", buffer);
}
The last argument of regcomp()
needs to be at least REG_EXTENDED
, or the functions will use basic regular expressions, which means that (for example) you would need to use a\{3\}
instead of a{3}
used from extended regular expressions, which is probably what you expect to use.
POSIX.2 has also another function for wildcard matching: fnmatch()
. It doesn't allow to compile the regular expression, or get the sub-strings matching a sub-expression, but it is very specific for checking when a filename match a wildcard (it uses the FNM_PATHNAME
flag).

- 28,547
- 16
- 75
- 90
While the answer above is good, I recommend using PCRE2. This means you can literally use all the regex examples out there now and not have to translate from some ancient regex.
I made an answer for this already, but I think it can help here too..
Regex In C To Search For Credit Card Numbers
// YOU MUST SPECIFY THE UNIT WIDTH BEFORE THE INCLUDE OF THE pcre.h
#define PCRE2_CODE_UNIT_WIDTH 8
#include <stdio.h>
#include <string.h>
#include <pcre2.h>
#include <stdbool.h>
int main(){
bool Debug = true;
bool Found = false;
pcre2_code *re;
PCRE2_SPTR pattern;
PCRE2_SPTR subject;
int errornumber;
int i;
int rc;
PCRE2_SIZE erroroffset;
PCRE2_SIZE *ovector;
size_t subject_length;
pcre2_match_data *match_data;
char * RegexStr = "(?:\\D|^)(5[1-5][0-9]{2}(?:\\ |\\-|)[0-9]{4}(?:\\ |\\-|)[0-9]{4}(?:\\ |\\-|)[0-9]{4})(?:\\D|$)";
char * source = "5111 2222 3333 4444";
pattern = (PCRE2_SPTR)RegexStr;// <<<<< This is where you pass your REGEX
subject = (PCRE2_SPTR)source;// <<<<< This is where you pass your bufer that will be checked.
subject_length = strlen((char *)subject);
re = pcre2_compile(
pattern, /* the pattern */
PCRE2_ZERO_TERMINATED, /* indicates pattern is zero-terminated */
0, /* default options */
&errornumber, /* for error number */
&erroroffset, /* for error offset */
NULL); /* use default compile context */
/* Compilation failed: print the error message and exit. */
if (re == NULL)
{
PCRE2_UCHAR buffer[256];
pcre2_get_error_message(errornumber, buffer, sizeof(buffer));
printf("PCRE2 compilation failed at offset %d: %s\n", (int)erroroffset,buffer);
return 1;
}
match_data = pcre2_match_data_create_from_pattern(re, NULL);
rc = pcre2_match(
re,
subject, /* the subject string */
subject_length, /* the length of the subject */
0, /* start at offset 0 in the subject */
0, /* default options */
match_data, /* block for storing the result */
NULL);
if (rc < 0)
{
switch(rc)
{
case PCRE2_ERROR_NOMATCH: //printf("No match\n"); //
pcre2_match_data_free(match_data);
pcre2_code_free(re);
Found = 0;
return Found;
// break;
/*
Handle other special cases if you like
*/
default: printf("Matching error %d\n", rc); //break;
}
pcre2_match_data_free(match_data); /* Release memory used for the match */
pcre2_code_free(re);
Found = 0; /* data and the compiled pattern. */
return Found;
}
if (Debug){
ovector = pcre2_get_ovector_pointer(match_data);
printf("Match succeeded at offset %d\n", (int)ovector[0]);
if (rc == 0)
printf("ovector was not big enough for all the captured substrings\n");
if (ovector[0] > ovector[1])
{
printf("\\K was used in an assertion to set the match start after its end.\n"
"From end to start the match was: %.*s\n", (int)(ovector[0] - ovector[1]),
(char *)(subject + ovector[1]));
printf("Run abandoned\n");
pcre2_match_data_free(match_data);
pcre2_code_free(re);
return 0;
}
for (i = 0; i < rc; i++)
{
PCRE2_SPTR substring_start = subject + ovector[2*i];
size_t substring_length = ovector[2*i+1] - ovector[2*i];
printf("%2d: %.*s\n", i, (int)substring_length, (char *)substring_start);
}
}
else{
if(rc > 0){
Found = true;
}
}
pcre2_match_data_free(match_data);
pcre2_code_free(re);
return Found;
}
Install PCRE using:
wget https://ftp.pcre.org/pub/pcre/pcre2-10.31.zip
make
sudo make install
sudo ldconfig
Compile using :
gcc foo.c -lpcre2-8 -o foo
Check my answer for more details.

- 1,127
- 4
- 23
- 39