2

I am using a vector to retrieve the names of the files into a path. I only want to get a certain type of file. That's why I am trying to use .find to detect if the file has the conllu format. I want to find out if a string contains "conllu" or not

void read_directory(const std::string& name, stringvec& v)
{
    std::string pattern(name);
    pattern.append("\\*");
    WIN32_FIND_DATA data;
    HANDLE hFind;

    if ((hFind = FindFirstFile(pattern.c_str(), &data)) != INVALID_HANDLE_VALUE) {
        while (FindNextFile(hFind, &data) != 0) {
            v.push_back(data.cFileName);
        } 
        FindClose(hFind);
    }
}
std::vector<std::string> v;
std::vector<std::string>::iterator it;
read_directory("path", v);
it = find(v.begin(), v.end(), ".conllu");
if (it != v.end())
    std::cout << "Element found in myvector: " << *it << '\n';

Example of file names in the vector :

.gitignore
CONTRIBUTING.md
el_gdt-ud-dev.conllu

2 Answers2

2

If you want to check if a std::vector<std::string> contains a specific std::string, you just have to do something like:

bool contains(const std::string & word, const std::vector<std::string> & set)
{
    bool found(false);
    for(size_t i = 0; !found && (i < set.size()); ++i)
    {
        if(set[i] == word)
            found = true;
    }
    return found;
}

Now, if you want to check if a std::string contains a specific "substring", this is more complicated.
I did it this way:

bool contains(const std::string & pattern, const std::string & str)
{
    bool found(false);

    bool ongoing(false);
    size_t cursor(0);
    for(size_t i = 0; (!pattern.empty()) && !found && (i < str.length()); ++i)
    {
        if(ongoing)
        {
            if(str[i] == pattern[0])
            {
                cursor = 1;
            }
            else if(str[i] == pattern[cursor])
            {
                if(cursor == pattern.length()-1)
                    found = true;
                else
                    ++cursor;
            }
            else
            {
                ongoing = false;
                cursor = 0;
            }
        }
        else
        {
            if(str[i] == pattern[0])
            {
                if(pattern.size() == 1)
                    found = true;
                else
                {
                    ongoing = true;
                    ++cursor;
                }
            }
        }
    }

    return found;
}

I tested it in all cases and it worked successfully.

I hope it can help.


EDIT: I would be very surprised if there is no existing library that already has implemented this kind of function. But, if we want to implement it ourselves, this is a way to do it.


EDIT 2: I realized that my string search function had a problem with patterns that contains the same letter multiple times.
Therefore, I have written a much better (simple/concise/functional/efficient) implementation of this function that can handle all possible cases:

bool contains(const std::string & str, const std::string & pattern)
{
    bool found(false);

    if(!pattern.empty() && (pattern.length() < str.length()))
    {
        for(size_t i = 0; !found && (i <= str.length()-pattern.length()); ++i)
        {
            if((str[i] == pattern[0]) && (str.substr(i, pattern.length()) == pattern))
            {
                found = true;
            }
        }
    }

    return found;
}
Fareanor
  • 5,900
  • 2
  • 11
  • 37
1

You need to search each string in the vector for the substring .conllu. I would suggest a loop and std::string::find.

#include <vector>
#include <string>
#include <iostream>

int main() {
    std::vector<std::string> v = { "nope", "yes.conllu", "also.conllu", "nothere" };

    for (auto& str : v) {
        if (str.find(".conllu") != std::string::npos) {
            std::cout << "Found .conllu in " << str << std::endl;
        }
    }
}
super
  • 12,335
  • 2
  • 19
  • 29