0

Can you help me with this aproach :

The thing is, we need to do a case-insensitive search for the keywords in a string (for a function which return true is if any of the keyword is found in the string, elsewise false)

So I am using this piece of code:

    std::transform(string.begin(), string.end(), string.begin(), ::toupper);
    std::transform(keywords.begin(), keywords.end(), keywords.begin(), ::toupper);

    std::istringstream iss(keywords);
    std::string word;

    while(iss >> word) {
        if(string.find(word) != std::string::npos)
        return true;
    }
    return false;

The problem with this is that it creates unnecessary copies of the existing data. Can there be a better approach to it?

  • Don't do the toupper for your keywords all the time, just have them as uppercase symbols in a std::set. Then you only have to uppercase your input and check if it is in the map of keywords. – Pepijn Kramer Jul 30 '22 at 09:54
  • `while(iss > word)` - I'm not familiar with this syntax of comparing a stringstream to a string. Can you elaborate on what that does? Is there some sort of operator overload or implicit cast going on. What type of collection is `keywords`. I'm really confused on this one because I thought the only thing you can pass to an istringstream constructor is a string. – selbie Jul 30 '22 at 10:01
  • @selbie The stream is created here: `std::istringstream iss(keywords);`. `while(iss > word)` is probably supposed to be `while(iss >> word)` – fabian Jul 30 '22 at 10:24
  • Yes it was a typing mistake from my side ! – Harshit Sharma Jul 30 '22 at 11:07

3 Answers3

0

First of all for making this more reuseable creating an object responsible for holding the keyword data would be preferrable. You can use std::string_views, std::pair<std::string::const_iterator, std::string::const_iterator> or something similar to avoid making a copy of the string data for the keywords and using std::search to find the keywords allows you to prevent having to copy the string to convert it to upper case for a search while also keeping benefit of converting the keywords to upper case:

class KeywordSearch
{
    std::vector<std::string_view> m_keywords;
    std::string m_keywordData;
public:
    KeywordSearch(std::string&& keywords)
        : m_keywordData(std::move(keywords))
    {
        auto pos = m_keywordData.begin();
        auto const end = m_keywordData.end();
        std::for_each(pos, end, [](char& c) { c = std::toupper(c); });
        pos = std::find_if(pos, end, [](unsigned char c) { return !std::isspace(c); });
        while (pos != end)
        {
            auto keywordEnd = std::find_if(pos + 1, end, [](unsigned char c) { return std::isspace(c); });
            m_keywords.emplace_back(pos, keywordEnd);
            pos = std::find_if(keywordEnd, end, [](unsigned char c) { return !std::isspace(c); });
        }
    }

    // allow only move for now; copy would require an update of m_keywords
    KeywordSearch(KeywordSearch&&) noexcept = default;
    KeywordSearch& operator=(KeywordSearch&&) noexcept = default;

    bool operator()(std::string const& haysack) const
    {
        for (auto const& keyword : m_keywords)
        {
            if (std::search(haysack.begin(), haysack.end(), keyword.begin(), keyword.end(),
                [](char haysackChar, char keywordChar)
                {
                    return std::toupper(haysackChar) == keywordChar;
                }) != haysack.end())
            {
                return true;
            }
        }
        return false;
    }
};

void Test(KeywordSearch const& search, std::string const& str)
{
    std::cout << (search(str) ? "    keyword found in '" : "keyword not found in '") << str << "'\n";
}

int main() {
    KeywordSearch search("foo bar baz");
    Test(search, "NoFoOB");
    Test(search, "barblabla");
    Test(search, "babbba");
    Test(search, "hello world");
    Test(search, "hello wobaz");
}
fabian
  • 80,457
  • 12
  • 86
  • 114
0

Yes, maybe. If you want to avoid copies, then you can use an iterator.

C++ offers a functionality to iterate over paterrns in a string. This is the std::sregex_token_iterator. You can read here about that. You can either define a "positive" pattern, so, what you are looking for. Example: "\w+" will look for words. Or, you do a "negative" search, meaning, specify the separator (e.g., ' ' as a std::regex) and add "-1" as fourth parameter.

Then you can iterate over all keywords.

As for the case insenitivity. Do the conversion one time. I will not even show it in my below example.

First I created a small demo, where I print out the keywords that have been found in the given std::string.

#include <iostream>
#include <regex>
#include <string>
#include <iterator>
#include <algorithm>

const std::regex re{ R"(\w+)" };

int main() {

    // Keys
    const std::string keys{ "abc def ghi jkl" };
    // Search string
    std::string s{ "abcxxxghixxx" };

    std::copy_if(std::sregex_token_iterator(keys.begin(), keys.end(), re), {},
        std::ostream_iterator<std::string>(std::cout,"\n"),
        [&s](const std::string& k) {return s.find(k) != std::string::npos; });
}

This approach can be taken over to build your needed function.

One of many possible solutions:

#include <iostream>
#include <regex>
#include <string>
#include <iterator>
#include <algorithm>

const std::regex re{ R"(\w+)" };

bool isAnyKeyWordInString(const std::string& keys, const std::string& s) {

    bool result{};

    std::for_each(std::sregex_token_iterator(keys.begin(), keys.end(), re), {},
        [&](const std::string& k) {result |= (s.find(k) != std::string::npos); });

    return result;
}

int main() {

    // Keys
    const std::string keys{ "abc def ghi jkl" };
    // Search string
    std::string s{ "abcxxxghixxx" };

    // Evaluate
    if (isAnyKeyWordInString(keys, s)) 
        std::cout << "At least one key-word found\n";
    else
        std::cout << "No Keyword found\n";
}
A M
  • 14,694
  • 5
  • 19
  • 44
0

this is fast!!! but you have to keep changing the parameters depending on your requirement, for example: const int keyword_len = (int)keyword.length(); here i'm casting the unsigned long int to int. also i'm not converting my searchable string to upper case before my conditional statement because its faster this way instead of looping through the string when i need to search. when it comes to complexity it's O(n)+k where k<<n, because i'm not comparing the entire keyword size to the searchable string. P.s. sorry for not making this code ideally reusable, but it is intractable.

#include <cctype>
#include <iostream>
#include <string>

int main(){

    std::string searchable_string;
    std::string keyword;
    std::cout<<"enter the searchable string : "<<std::endl;
    std::getline(std::cin, searchable_string);
    std::cout<<"enter the keyword you are looking for : "<<std::endl;
    std::getline(std::cin, keyword);
    
    const int keyword_len = (int)keyword.length();
    const int searchable_string_len = (int)searchable_string.length();
    
    int upper_keyword[keyword_len];
    int i = 0;
    for(;i<keyword_len;){
        upper_keyword[i] = toupper(x); 
        i++;
    }
    i = 0;
    int j;
 keeplooking:
    j = 0;
 comparetokeyword:
    if(upper_keyword[j]==toupper(searchable_string[i+j])){
        j++;
        if(j <= (keyword_len)-1) goto comparetokeyword;
        else {
            //'i' is the relative position of your keyword.
            std::cout<<"found keyword at: "<<std::endl<<i<<std::endl;
            std::cout<<"do you want to keep looking for more(enter '1' for Yes and '0' for No) :"<<std::endl;
            bool x;
            std::cin>>x;
            if(x){
                i++;
                if(i<searchable_string_len) goto keeplooking;
                else goto terminate;
            }
        }
    }
    else{
        i++;
        if(i<searchable_string_len){ 
            goto keeplooking;    
        }
 terminate:
        std::cout<<"reached the end."<<std::endl;
    }
}
  • Your approach works, but a few remarks: 1. Worst-case time complexity is O(n * k). Consider the case where both the keyword and source strings are composed of single character repetition. Maybe average-case under some reasonable assumption is different, but I'm not sure. 2. You use *VLA* for `keyword_len`. Unlike in C, this feature is a *non-standard extension* in C++ supported by certain compilers, but they aren't required to do so. See [this question](https://stackoverflow.com/questions/1887097/why-arent-variable-length-arrays-part-of-the-c-standard) and answers. – YurkoFlisk Aug 01 '22 at 18:50
  • * *VLA* for `upper_keyword`. 3. `std::string::length` returns `size_t`, which is [not required](https://en.cppreference.com/w/cpp/types/size_t) to be `unsigned long int`, so the conversion to `int` you mention can be from a different type. Also, I don't see a need for these conversions at all - you could just use `size_t` for lengths and indexes everywhere, as is usually done. – YurkoFlisk Aug 01 '22 at 19:04