0

This is the C function I am having problems with:

char get_access_token(char *client_credentials)
{
    regex_t regex;
    int reti;
    char msgbuf[100];
    reti = regcomp(&regex, "\\\"access_token\\\".\\\"(.*?)\\\"", 0);

    regmatch_t pmatch[1];
    if (reti) {
        fprintf(stderr, "Could not compile regex\n");
        exit(1);
    }

    reti = regexec(&regex, client_credentials, 1, pmatch, 0);
    if (!reti) {
        puts("Match");
    } else if (reti == REG_NOMATCH) {
        puts("No match");
    } else {
        regerror(reti, &regex, msgbuf, sizeof(msgbuf));
        fprintf(stderr, "Regex match failed: %s\n", msgbuf);
        exit(1);
    }

    return (char) "";
}

The string that I'm trying to parse is a JSON string, I don't care about the actual structure I only care about the access token.

It should look like this:

{"access_token": "blablablabal"}

I want my function to return just "blablablabla"

The RegEx that I'm trying to use is this one:

\"access_token"."(.*?)"

but I can't find that in the variable pmatch, I only find two numbers in that array, I don't really know what those numbers mean.

What am I doing wrong?

P.S. I'm a C noob, I'm just learning.

ILikeTacos
  • 17,464
  • 20
  • 58
  • 88

2 Answers2

1

There's several problems. You have typos in your regex. And you're trying to use extended regex features with a POSIX regex.

First the typos.

reti = regcomp(&regex, "\\\"access_token\\\".\\\"(.*?)\\\"", 0);
                                            ^

That should be:

reti = regcomp(&regex, "\\\"access_token\\\": \\\"(.*?)\\\"", 0);

Then we don't need to escape quotes in regexes. That makes it easier to read.

reti = regcomp(&regex, "\"access_token\": \"(.*?)\"", 0);

This still doesn't work because it's using features that basic POSIX regexes do not have. Capture groups must be escaped in a basic POSIX regex. This can be fixed by using REG_EXTENDED. The *? non-greedy operators is an enhanced non-POSIX feature borrowed from Perl. You get them with REG_ENHANCED.

reti = regcomp(&regex, "\"access_token\": \"(.*?)\"", REG_ENHANCED|REG_EXTENDED);

But don't try to parse JSON with a regex for all the same reasons we don't parse HTML with a regex. Use a JSON library such as json-glib.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • I tried using the library, but then again, I was running into another sort of issues. That's why I thought it was going to be easier to just parse JSON. Although I'm not a beginner developer, I'm very much a noob with C, so having troubles with it. – ILikeTacos Sep 01 '18 at 17:34
  • @ILikeTacos C is unforgiving. I imagine you might have had some trouble with shared libraries. Maybe try out json-glib and ask a question if you run into trouble. – Schwern Sep 01 '18 at 17:40
  • 1
    I did have a lot of trouble with shared libraries, but I figured it out. All I needed to do was to pass the path to the library to the linker, but I banged my head against the wall for a few seconds. I'll probably be spamming SO with questions about json-glib as soon as i finish installing it. – ILikeTacos Sep 01 '18 at 18:03
  • @ILikeTacos `pkg-config` can save you a lot of headaches when it comes to linking libraries. – Schwern Sep 01 '18 at 18:37
  • I think the code could replace the occurrences of `\\\"` with just `\"`. Within the regexes, a double quote isn't special (not a metacharacter) and doesn't need a backslash escape. – Jonathan Leffler Sep 01 '18 at 18:53
1

Well, your pmatch array must have at least two elements, as you probably know, group 0 is the whole matching regexp, and it is filled for the whole regexp (like if all the regular expression were rounded by a pair of parenthesis) you want group 1, so pmatch[1] will be filled with the information of the first subexpression group.

If you look in the doc, the pmatch element has two fields that index the beginning index in the original buffer where the group was matched, and the one past the last index of the place in the string where the group ends. These field names are rm_so and rm_eo, and like the ones in pmatch[0], they indicate the index at where the regular (sub)expression begins and ends, resp.

You can print the matched elements with (once you know that they are valid, see doc) with:

#define SIZEOF(arr) (sizeof arr / sizeof arr[0])
...
regmatch_t pmatch[2]; /* for global regexp and group 1 */
...
/* you don't need to escape " chars, they are not special for regcomp,
 * they do, however, for C, so only one \ must be used. */
res = regcomp(&regex, "\"access_token\".\"([^)]*)\"", 0);
...
reti = regexec(&regex, client_credentials, SIZEOF(pmatch), pmatch, 0);

for (i = 0; i < regex.re_nsub; i++) {
    char *p = client_credentials + pmatch[i].rm_so; /* p points to beginning of match */
    size_t l = pmatch[i].rm_eo - pmatch[i].rm_so; /* match length */
    printf("Group #%d: %0.*s\n", i, l, p);
}

My apologies for submitting a snippet of code instead of a verifiable and complete example, but as you didn't do it in the question (so we could not test your sample code) I won't do in the answer. So, the code is not tested, and can have errors on my side. Beware of this.

Testing a sample response requires time, worse if we have first to make your sample code testable at all. (this is a complaint about the beginners ---and some nonbeginners--- use of not posting Minimal, Complete, and Verifiable example).

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31