1

I just made a function findFile to find whether a file with some pattern file_name_regex in the directory dir_name. Just test it in Coliru

#include <string>
#include <iostream>
#include <boost/regex.hpp>
#include <boost/filesystem.hpp>

namespace fs = boost::filesystem;

bool findFile(const std::string & dir_name, const std::string & file_name_regex)
{
    fs::path p(dir_name);
    if (!exists(p))
        return false;

    boost::regex file_regex(file_name_regex, boost::regex::basic);

    fs::directory_iterator end_itr;
    for (fs::directory_iterator itr(p);itr != end_itr; ++itr )
    {   
        if (!fs::is_directory(itr->path()))
        {               
            boost::sregex_iterator it(itr->path().filename().string().begin(),
                                   itr->path().filename().string().end(), 
                                   file_regex);
            boost::sregex_iterator end;
            for (; it != end; ++it){
                std::cout << it->str() << std::endl;
            }
        }   
        else {
            continue;
        }
    }   
    return false;
}

int main()
{
    findFile("/", "a.out" );
}

Compile and run it with the command:

g++ -std=c++11 -O2 -Wall -lboost_system -lboost_filesystem -lboost_regex main.cpp && ./a.out

It should print out:

a.out

But it gives out unexpected output:

.out

It is based on the solution of C++ Regular Expressions with Boost Regex

I also changed it to make a simple test also in Coliru:

#include <boost/regex.hpp>
#include <iostream>
#include <string>

int main()
{
    std::string text("a.out");
    const char * pattern = "a.out";    
    boost::regex ip_regex(pattern);

    boost::sregex_iterator it(text.begin(), text.end(), ip_regex);
    boost::sregex_iterator end;
    for (; it != end; ++it) {
        std::cout << it->str() << "\n";
        // v.push_back(it->str()); or something similar     
    }
}

It prints out the expected word a.out.

So what is wrong with my code?

leiyc
  • 903
  • 11
  • 23
  • You really need to study regular expressions more closely. The dot `.` have a special meaning. Are you sure you want regular expressions (which are often overkill) and not use [*globbing*](https://en.wikipedia.org/wiki/Glob_(programming))? – Some programmer dude Aug 14 '18 at 11:47
  • @Someprogrammerdude, yes, my code base is C++, and the name pattern may not that simple. Actually I want to catch the file with the basic regex "^test_*" pattern. – leiyc Aug 14 '18 at 12:01
  • 1
    `itr->path().filename().string().compare(0, 5, "test_"s)` is all you need in that case. Also, there is a built-in std::regex, no need to use boost. – rustyx Aug 14 '18 at 12:11
  • @rustyx, yes, it is. Good suggestion-:). Using boost is mainly for `boost::filesystem`(our code base is C++11 supported, but std::filesystem is C++17 supported). Also I see the regex of it, so I consider why not write a more general function... Oh, std::regex is also C++11 supported. – leiyc Aug 14 '18 at 12:24
  • 1
    `"test_*` is a *globbing* pattern and not a regular expression. You need to translate it to the regular expression `"test_.*"`. Of for such simple pattern see if the leading five-character sub-string is equal to `"test_"`. – Some programmer dude Aug 14 '18 at 15:11
  • @Someprogrammerdude, yes it is, so I use the `boost::regex::basic` to do this work. – leiyc Aug 15 '18 at 06:22
  • @Someprogrammerdude, _globbing_ pattern seems meet our requirement well, I got on example in https://stackoverflow.com/questions/8401777/simple-glob-in-c-on-unix-system – leiyc Aug 16 '18 at 08:35

1 Answers1

1

You've got UB due to a dangling pointer. The temporary itr->path().filename().string() is destroyed at the end of the following statement:

        boost::sregex_iterator it(itr->path().filename().string().begin(),
                               itr->path().filename().string().end(), 
                               file_regex);

So begin() and end() now point to garbage.

You need to hoist the temporary string out into a separate variable to extend its lifetime:

        std::string s = itr->path().filename().string();
        boost::sregex_iterator it(s.begin(), s.end(), file_regex);
rustyx
  • 80,671
  • 25
  • 200
  • 267
  • I run the program for many times, but the output is always `.out`without any crash in the Coliru , is there any explanation for this? – leiyc Aug 15 '18 at 06:18
  • UB means you can't predict what the result will be when writing your application, it doesn't mean random behavior. In your case there are no external factors that influence the behavior, so the issue is 100% repeatable. It may change if you recompile on a different platform or use different optimization flags. Read [this](https://www.nayuki.io/page/undefined-behavior-in-c-and-cplusplus-programs) for more info. – rustyx Aug 15 '18 at 08:00