0

I am using the standard <regex.h> library to match a complex string. Some matches are integers and my current solution is to use to_integer:

int to_integer(regmatch_t match, const char* str) {
    char buffer[16];
    int end = match.rm_eo - match.rm_so;
    memcpy(buffer, &str[match.rm_so], end);
    buffer[end] = '\0';
    return atoi(buffer);
}

struct data {
    int foo, bar, baz;
};

struct data fetch(char *string) {
   int groups = 4;
   regmatch_t matches[groups];
   if (regexec(&regex, string, groups, matches, 0)) abort();

   return (struct data){
       to_integer(matches[1], string),
       to_integer(matches[2], string),
       to_integer(matches[3], string)
   };
}

Is there a more elegant way that does not involve an intermediate buffer?

Without an intermediate buffer, the following would eventually fail: ([0-9]{3})[0-9]{2}. I also cannot modify str in place because it is constant.

EDIT

From this question I wrote the following:

int to_integer(regmatch_t match, const char* str) {
    const char *end = &str[match.rm_eo];
    return strtol(&str[match.rm_so], (char**)&end , 10);
}

Unfortunately, the explicit cast (char*) is quite ugly. My previous solution involving a copy of the string looks a bit safer IMHO.

nowox
  • 25,978
  • 39
  • 143
  • 293
  • @bruno it does not work if the current match is directly followed by a number – nowox Oct 18 '20 at 10:35
  • Your current approach seems fine. Maybe some logic to make sure you're not going to write past the end of `buffer`. – Shawn Oct 18 '20 at 11:10
  • 1
    Please check the [How to convert a string to integer in C?](https://stackoverflow.com/questions/7021725/how-to-convert-a-string-to-integer-in-c) thread. – Wiktor Stribiżew Oct 18 '20 at 11:26
  • In the EDIT section, the `end` argument to `strtol()` is a pure output argument, but the code seems to treat it as an input argument, or as an in/out argument. You may as well pass a NULL pointer since you don't look at `end` after calling `strtol()`. If you decide to keep it, it may as well be uninitialized and of the correct type (`char *`). If the regex doesn't necessarily consume all the digits in a digit sequence, using `strtol()` is not correct. It will read past the end of the matched digits. – Jonathan Leffler Oct 18 '20 at 16:56

0 Answers0