-2

I'm trying to match words which doesn't have a letter 'd' in it , but the regexec is still matching the words with letter d

int main(void) {
    regex_t regex;
    char *str = "dabcd";
    char *pattern = "[^d]*";
    
    int ret;

    ret = regcomp(&regex, pattern, 0);
    if (ret == 0) {
        printf("regex compilation successfully\n");
    } else {
        printf("regex compilation unsuccessful\n");
    }

    ret = regexec(&regex, str, 0, NULL, 0);
    if (ret == 0) {
        printf("there is a match\n");
    } else {
        printf("there is no match : %d\n", ret);

    }

    return 0;
}

how to solve it ? Is there something wrong with my regex expression?

  • 2
    `char *pattern = "^[^d]*$";` and `ret = regcomp(&regex, pattern, REG_EXTENDED);` – Wiktor Stribiżew Oct 17 '20 at 10:25
  • Okay let me clear it. If I want to match any charcters(one or more) for the placeholders www.*.com , where * should be any character but shouldn't contain letter 'd' then what regular expression should I use ? @WiktorStribiżew – blackdronzer Oct 17 '20 at 10:56
  • So, what is the pattern you tried to match the string? `[^d]*` is not the pattern. What is the string like? Where `www...com` can appear inside a string? – Wiktor Stribiżew Oct 17 '20 at 10:57
  • so www.abc.com should be valid, www.xyz.com should be valid but www.abcd.com and www.xyzd.com should not be valid. For this what regex should I use? – blackdronzer Oct 17 '20 at 11:00
  • Look [here](https://regex101.com/r/YlOfjV/1). – Wiktor Stribiżew Oct 17 '20 at 11:05
  • Your expression even matches aka.abc.bky! I want www..com, there has to be www..com – blackdronzer Oct 17 '20 at 11:11
  • 1
    It is your regex, not mine, just with anchors. What did *you* try to match `www...com`? Wha do you mean by "character"? A letter? A letter or digit? ... – Wiktor Stribiżew Oct 17 '20 at 11:13

1 Answers1

-1

Your expression only checks that at some point in the string there's a sequence of zero or more non-d characters. If you want the whole thing to match:

^[^d]*$

Where that translates to at the start of the string, zero or more non-d characters followed by the end of string.

This will require altering your compilation code:

ret = regcomp(&regex, pattern, REG_EXTENDED);

Where it's noted in the documentation:

Use POSIX Extended Regular Expression syntax when interpreting regex. If not set, POSIX Basic Regular Expression syntax is used.

The man pages for BSD express it even better:

Compile modern ("extended") REs, rather than the obsolete ("basic") REs that are the default.

So use that mode by default.

Tip: For debugging regular expressions use an explainer tool like Regex101.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • Just adding `$` won't work in the OP code. – Wiktor Stribiżew Oct 17 '20 at 10:28
  • @WiktorStribiżew You should add an answer with your variant including the `REG_EXTENDED` code. I can only speak to the expression itself, I'm not as familiar with that library. – tadman Oct 17 '20 at 10:29
  • This regular expression is correct. I did not touch on the C code changes required to make it work. – tadman Oct 17 '20 at 10:31
  • @tadman , what if I want it is middle of a string . Say I want to match ' abc ' -- I want to match this where the middle word should not contain the 'd' character. What regex should I need to use? – blackdronzer Oct 17 '20 at 10:43
  • Anything with `*` could be zero-length, so "middle of string" will always match and is basically meaningless. It's like asking "Where in the world can I find zero or more unicorns?" Everywhere. When asking regular expression questions it's very useful for us to see example input and the desired matches, but don't forget you can probably hack around with things like Regex101 to explore on your own a fair bit first. – tadman Oct 17 '20 at 10:44
  • Are you just looking for the longest match of non-`d` characters? If so your original pattern does just that. Remember when an expression matches you can extract the *matching text* if so desired. That's what the other arguments you're not using are used for. – tadman Oct 17 '20 at 10:46
  • Okay let me clear it. If I want to match any charcters(one or more) for the placeholders www.*.com , where * should be any character but shouldn't contain letter 'd' then what regular expression should I use @tadman – blackdronzer Oct 17 '20 at 10:53
  • This has diverged pretty significantly from the original question, so I'd open up a new one focused on specifically that. – tadman Oct 17 '20 at 11:10
  • No, it is the same question. Please remove the answer as it does not answer the question. – Wiktor Stribiżew Oct 17 '20 at 11:26