0

I'm trying to break apart a path formed by a series of folder names:

"/foldera/folderb/folderc"

into

"/foldera" "/folderb" "/folderc"

But I can't find out how to do this using std::regex,

{
    std::regex exp("^(/[a-zA-Z0-9-_]+)+");
    std::smatch res;
    std::string str = "/uuu/kkk";

    std::regex_search( str, res, exp ) ;
    {
        std::cout << res[0] <<";" << res[1] << std::endl;  
    }
    std::cout << std::endl;
}

It will only match either the whole string or the last "/kkk",

I will never find the match "/uuu"

I know the problem is solvable with string split, but I'm interested in a std::regex solution here, because the above is doable with javascript and Qt's regex. But I don't know how to do it with std::regex.

PS. the following also doesn't work:

{
    const std::string s = "/uuu/kkk";

    std::regex words_regex("(/[a-zA-Z0-9-_]+)+");
    auto words_begin = 
        std::sregex_iterator(s.begin(), s.end(), words_regex);
    auto words_end = std::sregex_iterator();

    std::cout << "Found " 
          << std::distance(words_begin, words_end) 
          << " words:\n";

    for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
        std::smatch match = *i;                                                 
        std::string match_str = match.str(); 
        std::cout << match_str << '\n';
    }  
}
Bill Yan
  • 3,369
  • 4
  • 27
  • 42
  • 2
    Use `std::sregex_iterator` instead https://www.regular-expressions.info/stdregex.html – wp78de Sep 11 '19 at 17:27
  • that doesn't work, edited question – Bill Yan Sep 11 '19 at 19:04
  • Try `std::regex words_regex("(/[a-zA-Z0-9-_]+)");` – wp78de Sep 11 '19 at 20:23
  • this won't work, because it matches /afwef#/awefg, which is illicit path by my definition. – Bill Yan Sep 12 '19 at 02:13
  • To use a regex iterator, you have to remove the repetition from the regex itself and then iterate outside the regex, so it should work with `std::regex words_regex("(/[a-zA-Z0-9-_]+)");` – joanis Sep 12 '19 at 02:14
  • By the way, `/asdf//qwer/` is a valid path too, that your original regex won't allow. It's equivalent to `/asdf/qwer`, but it's still valid. – joanis Sep 12 '19 at 02:16
  • by my definition, it's not a valid path. my definition of a path is formalized by "(/[a-zA-Z0-9-_]+)+" , the question is about how to to get all captures of "(/[a-zA-Z0-9-_]+)+", – Bill Yan Sep 12 '19 at 13:52
  • It's still not clear to me. Do you want some sort of overlapping matches? – wp78de Sep 12 '19 at 16:12
  • given the string " /afwef#/awefg", your solution will accept it and return "/afwef" "/awefg". a correct logic, however, should reject the string as a valid input. – Bill Yan Sep 13 '19 at 18:17
  • I think you want two things at once. Validate the string path and get the parts of the path separately. I came up with [this](https://regex101.com/r/Pps3lb/2) regex but recommend against it. Validate first then split. – wp78de Sep 16 '19 at 18:28

0 Answers0