1

I have tried different functions including strtok(), strcmp() and strstr(), but I guess I'm missing something. Is there a way to match the exact substring in a string?

For example:

If I have a name: "Tan"

And I have 2 file names: "SomethingTan5346" and "nothingTangyrs634"

So how can I make sure that I match the first string and not both? Because the second file is for the person Tangyrs. Or is it impossible with this approach? Am I going at it the wrong way?

msc
  • 33,420
  • 29
  • 119
  • 214
Rishaal
  • 158
  • 13
  • 4
    What do you mean 'exact' substring? Both have the substring. – J...S Nov 01 '17 at 08:41
  • 3
    The first thing you have to do is strictly define what you mean by "name". You seem to be breaking your two example file names into separate strings. To fully match those strings, one approach would be to separate a filename into what you define as its separate substrings, and then look for *complete* matches on the substrings with something like `strcmp()`. – Andrew Henle Nov 01 '17 at 08:41
  • 1
    What is the structure of your filenames? Is SOMETHINGTAN5346 a valid filename? Should TAN be found here? – Costantino Grana Nov 01 '17 at 08:47
  • 1
    As others have said, decide what is your convention for the file names first. Couldn't you `p = strstr(str, "Tan")` and if it is found check whether the next character at `p[strlen("Tan")]` is a digit? (That would fail if there are users with names `"Tan5"` and `"gTan"`, of course.) – M Oehm Nov 01 '17 at 08:47
  • Why should "SomethingTan5346" be matched but not "nothingTangyrs634"? Both contain the substring "Tan". You need to be more specific. – Jabberwocky Nov 01 '17 at 08:59
  • 3
    It looks like you have a too vague idea of what your code should do. At one place you mention an "exact substring" without defining what it is. At other place you mention a "file" that is "for" a "person", again without defining what these words mean. If you think these things are self-evident, think again. Is `!@#$Tz:Gh012NV65$#` a "file"? What "person" is it "for"? Why? – n. m. could be an AI Nov 01 '17 at 09:03

3 Answers3

4

If, as seems to be the case, you just want to identify strings that have your text but are immediately followed by a digit, your best bet is probably to get yourself a good regular expression implementation and just search for Tan[0-9].

It could be done simply be using strstr() to find the string then checking the character following that with isnum() but the actual code to do that would be:

  1. not as easy as you think since you may have to do multiple searchs (e.g., TangoTangoTan42 would need three checks); and
  2. inadvisable if there's the chance the searches my become more complex (such as Tan followed by 1-3 digits or exactly two @ characters and an X).

A regular expression library will make this much easier, provided you're willing to invest a little effort into learning about it.


If you don't want to invest the time in learning regular expressions, the following complete test program should be a good starting point to evaluate a string based on the requirements in the first paragraph:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

int hasSubstrWithDigit(char *lookFor, char *searchString) {
    // Cache length and set initial search position.

    size_t lookLen = strlen(lookFor);
    char *foundPos = searchString;

    // Keep looking for string until none left.

    while ((foundPos = strstr(foundPos, lookFor)) != NULL) {
        // If at end, no possibility of following digit.

        if (strlen(foundPos) == lookLen) return 0;

        // If followed by digit, return true.

        if (isdigit(foundPos[lookLen])) return 1;

        // Otherwise keep looking, from next character.

         foundPos++;
    }

    // Not found, return false.

    return 0;
}

int main(int argc, char *argv[]) {
    if (argc < 3) {
        printf("Usage testprog <lookFor> <searchIn>...\n");
        return 1;
    }
    for (int i = 2; i < argc; ++i) {
        printf("Result of looking for '%s' in '%s' is %d\n", argv[1], argv[i], hasSubstrWithDigit(argv[1], argv[i]));
    }
    return 0;
}

Though, as you can see, it's not as elegant as a regex search, and is likely to become even less elegant if your requirements change :-)

Running that with:

./testprog Tan xyzzyTan xyzzyTan7 xyzzyTangy4 xyzzyTangyTan12

shows it is action:

Result of looking for 'Tan' in 'xyzzyTan' is 0
Result of looking for 'Tan' in 'xyzzyTan7' is 1
Result of looking for 'Tan' in 'xyzzyTangy4' is 0
Result of looking for 'Tan' in 'xyzzyTangyTan12' is 1
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
1

The solution depends on your definition of exact matching. This might be useful for you:

  1. Traverse all matches of the target substring.

C find all occurrences of substring

Finding all instances of a substring in a string

find the count of substring in string

https://cboard.cprogramming.com/c-programming/73365-how-use-strstr-find-all-occurrences-substring-string-not-only-first.html

etc.

  1. Having the span of the match, verify that the previous and following characters match/do not match your criterion for "exact match".

Or,

  1. You could take advantage of regex in C++ (I know the tag is "C"), with #include <regex>, or POSIX #include <regex.h>.
1

You may want to use strstr(3) to search a substring in a string, strchr(3) to search a character in a string, or even regular expressions with regcomp(3).

You should read more about parsing techniques, notably about recursive descent parsers. In some cases, sscanf(3) with %n can also be handy. You should take care of the return count.

You could loop to read then parse every line, perhaps using getline(3), see this.

You need first to document your input file format (or your file name conventions, if SomethingTan5346 is some file path), perhaps using EBNF notation.

(you probably want to combine several approaches I am suggesting above)

BTW, I recommend limiting (for your convenience) file paths to a restricted set of characters. For example using * or ; or spaces or tabs in file paths is possible (see path_resolution(7)) but should be frowned upon.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547