0

How do you match characters separated by a specific character, lets say ';' and ignore the spaces in front of and behind the match but retain the one inside?

(word1); (word2) ; (word31 word32) Paranteses only denote the matches.

So far I have \s*([a-zA-Z0-9\s]*[a-zA-Z0-9]+)\s*[;] but I don't know how to make the words repeat. And it should also be capable of handling empty words, so something like (word);;(word),(word); ;(word) or (word);(word);. Since it ignores spaces the first two should be equivalent.

Well the main problem is that I don't know how to handle the split and the two options of legit word and empty word since my statement requires at least 1 symbol.

Alternatively it could be solved if I allow repeated separator that has spaces in between, but that loops back to the fact I don't know how to handle the splitting.

Edit: Also i intend to use it in C++ Edit: This is probably it, can i get factcheck? \s*([a-zA-Z0-9\s]*[a-zA-Z0-9]+)[;]*\s*[;]*

Zerg Overmind
  • 955
  • 2
  • 14
  • 28

2 Answers2

1

Since long regexps with nested quantifiers (even if written acc. to unroll-the-loop principle in mind) often cause issues with std::regex, it seems a splitting approach is best in this situation.

Here is a C++ demo:

#include <string>
#include <iostream>
#include <regex>
using namespace std;

int main() {
    std::vector<std::string> strings;
    std::string s = "word1; word2  ; word31 word32";
    std::regex re(R"(\s*;\s*)");
    std::regex_token_iterator<std::string::iterator> it(s.begin(), s.end(), re, -1);
    decltype(it) end{};
    while (it != end){
        strings.push_back(*it++);
    }
    for (auto& s: strings){ //std::cout << strings[strings.size()-1] << std::endl;
        std::cout << "'" << s << "'" << std::endl;
    }
    return 0;
}

Output:

'word1'
'word2'
'word31 word32'

The pattern is defined in R"(\s*;\s*)" - it matches semicolons enclosed with 0+ whitespaces.

NOTE: This approach might require to trim the input string from whitespaces, see What's the best way to trim std::string? for various approaches on stripping leading/trailing whitespace.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

Try this:

#include <iostream>
#include <string>
#include <regex>

int main()
{
    std::string s = "  w1 w2 w3;   word1 word2    ; word1  ;  ";

    for (std::smatch m; std::regex_search(s, m, std::regex("\\b([a-z0-9\\s]+)\\b", std::regex::icase)); s = m.suffix())
    {
        std::cout << m[1] << std::endl;
    }

    return 0;
}

Prints:

w1 w2 w3
word1 word2
word1
Killzone Kid
  • 6,171
  • 3
  • 17
  • 37