4

I have a text file which contains the following text

License = "123456"

GeneralLicense = "56475655"

I want to search for License as well as for GeneralLicense.

while (getline(FileStream, CurrentReadLine))
{

    if (CurrentReadLine.find("License") != std::string::npos)
    {
        std::cout << "License Line: " << CurrentReadLine;
    }
    if (CurrentReadLine.find("GeneralLicense") != std::string::npos)
    {
        std::cout << "General License Line: " << CurrentReadLine;
    }
}

Since the word License also present in the word GeneralLicense so if-statement in the line if (CurrentReadLine.find("License") != std::string::npos) becomes true two times.

How can I specify that I want to search for the exact sub-string?

UPDATE: I can reverse the order as mentioned by some Answers OR check if the License is at Index zero. But isn't there anything ROBOUST (flag or something) which we can speficy to look for the exact match (Something like we have in most of the editors e.g. MS Word etc.).

Community
  • 1
  • 1
skm
  • 5,015
  • 8
  • 43
  • 104
  • Possible duplicate of [how to check string start in C++](http://stackoverflow.com/questions/8095088/how-to-check-string-start-in-c) – Bernhard Barker May 19 '17 at 15:33
  • @RawN: I can do that but it won't be very robust because there is no space before `License` and its not sure that there will always be a space between `License` and `=`. – skm May 19 '17 at 15:33
  • Swap the order around, check for `GeneralLicense` first, and then check for `License`. – Sean May 19 '17 at 15:33
  • @Sean That, and add a `break` after the find. – Sergey Kalinichenko May 19 '17 at 15:34
  • The duplicate is a possible solution to this. "Search for the exact sub-string" doesn't make much sense - that's exactly what `string::find` already does. You'll need to be more precise about the format of the file and what should and shouldn't match. – Bernhard Barker May 19 '17 at 15:34
  • If you want something robust I think you are going to need to use a `regex`. Most find options will find a word if it is part of a word unless you use a option like `Find whole words only` option in MS Word. – NathanOliver May 19 '17 at 15:45
  • Don't hard-code the strings you want to find like that. Put them in `std::vector` and sort them by length, and then match in a loop. That's what one does when one composes an alternating pattern for a regex match anyway. – Sinan Ünür May 19 '17 at 17:22

6 Answers6

5
while (getline(FileStream, CurrentReadLine))
{
    if (CurrentReadLine.find("GeneralLicense") != std::string::npos)
    {
        std::cout << "General License Line: " << CurrentReadLine;
    }
    else if (CurrentReadLine.find("License") != std::string::npos)
    {
        std::cout << "License Line: " << CurrentReadLine;
    }
}
Jeffrey Chung
  • 19,319
  • 8
  • 34
  • 54
  • 1
    Can you add a sentence or two that explains what you did and how it solves the OP's problem? – R Sahu May 19 '17 at 15:37
  • 1
    This requires that you always put things in the correct order. That is very prone to human error, especially if you were to try to modify this long after you've forgotten why you did it this way. – Bernhard Barker May 19 '17 at 15:39
  • @Dukeling it is not nearly as obscure or fragile if you put the strings you want to match in a vector and sort that by word length. – Sinan Ünür May 19 '17 at 17:23
2

The more ROBUST search is called a regex:

#include <regex>

while (getline(FileStream, CurrentReadLine))
{
    if(std::regex_match(CurrentReadLine,
        std::regex(".*\\bLicense\\b.*=.*")))
    {
        std::cout << "License Line: " << CurrentReadLine << std::endl;
    }
    if(std::regex_match(CurrentReadLine,
        std::regex(".*\\bGeneralLicense\\b.*=.*")))
    {
        std::cout << "General License Line: " << CurrentReadLine << std::endl;
    }
}

The \b escape sequences denote word boundaries.

.* means "any sequence of characters, including zero characters"

EDIT: You could also use regex_search instead of regex_match to search for substrings that match instead of using .* to cover the parts that don't match:

#include <regex>

while (getline(FileStream, CurrentReadLine))
{
    if(std::regex_search(CurrentReadLine, std::regex("\\bLicense\\b"))) 
    {
        std::cout << "License Line: " << CurrentReadLine << std::endl;
    }
    if(std::regex_search(CurrentReadLine, std::regex("\\bGeneralLicense\\b")))
    {
        std::cout << "General License Line: " << CurrentReadLine << std::endl;
    }
}

This more closely matches your code, but note that it will get tripped up if the keywords are also found after the equals sign. If you want maximum robustness, use regex_match and specify exactly what the whole line should match.

L. Scott Johnson
  • 4,213
  • 2
  • 17
  • 28
0

You can check if the position at which the substring appears is at index zero, or that the character preceding the initial position is a space:

bool findAtWordBoundary(const std::string& line, const std::string& search) {
    size_t pos = line.find(search);
    return (pos != std::string::npos) && (pos== 0 || isspace(line[pos-1]));
}

Isn't there anything ROBUST (flag or something) which we can specify to look for the exact match?

In a way, find already looks for exact match. However, it treats a string as a sequence of meaningless numbers that represent individual characters. That is why std::string class lacks the concept of "full word", which is present in other parts of the library, such as regular expressions.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
0

You could write a function that tests for the largest match first and then returns what ever information you want about the match.

Something a bit like:

// find the largest matching element from the set and return it
std::string find_one_of(std::set<std::string, std::greater<std::string>> const& tests, std::string const& s)
{
    for(auto const& test: tests)
        if(s.find(test) != std::string::npos)
            return test;
    return {};
}

int main()
{
    std::string text = "abcdef";

    auto found = find_one_of({"a", "abc", "ab"}, text);

    std::cout << "found: " << found << '\n'; // prints "abc"
}
Galik
  • 47,303
  • 4
  • 80
  • 117
0

If all matches start on pos 0 and none is prefix of an other, then the following might work

if (CurrentReadLine.substr( 0, 7 ) == "License")
0

You can tokenize your string and do a full comparison with your search key and the tokens

Example:

#include <string>
#include <sstream>
#include <vector>
#include <iostream>

auto tokenizer(const std::string& line)
{
    std::vector<std::string> results;
    std::istringstream ss(line);
    std::string s;
    while(std::getline(ss, s, ' '))
        results.push_back(s);
    return results;
}

auto compare(const std::vector<std::string>& tokens, const std::string& key)
{
    for (auto&& i : tokens)
        if ( i == key )
            return true;
    return false;
}

int main()
{
    std::string x = "License = \"12345\"";
    auto token = tokenizer(x);
    std::cout << compare(token, "License") << std::endl;
    std::cout << compare(token, "GeneralLicense") << std::endl;
}
Community
  • 1
  • 1
Amadeus
  • 10,199
  • 3
  • 25
  • 31