Regex is way too slow (and a bit overkill) for this. What you need is commonly known as splitting a string, and the algorithm to do it is quite simple. Here are some answers where you can find implementations for it:
Splitting a C++ std::string using tokens, e.g. “;”
How do I iterate over the words of a string?
Here's a simple implementation I wrote:
std::vector<std::string> split(std::string s, std::string delim) {
std::vector<std::string> result;
auto last_pos = 0;
for (auto pos = s.find(delim);
pos != std::string::npos;
pos = s.find(delim, last_pos)) {
result.emplace_back(s.begin()+last_pos, s.begin()+pos);
last_pos = pos+delim.size();
}
result.emplace_back(s.begin()+last_pos, s.end());
return result;
}
For the purposes of this answer, here's also an implementation of trim
, which we use to remove spaces from the start and end of a string:
std::string& trim_inplace(std::string& s) {
auto not_space = [](char c) {return c != ' ';};
s.erase(s.begin(), std::find_if(s.begin(), s.end(), not_space));
s.erase(std::find_if(s.rbegin(), s.rend(), not_space).base(), s.rbegin().base());
return s;
}
Now that we got those out of the way, here's what you wanna do:
- Split the string using
|
as a delimiter;
- For each substring:
- Remove any parts you don't want, if applicable
- Trim the result
Or, in code:
std::string input = "|(DATA)6S|3E6U22|London UK (2022-09)|.0007|10.8|11|1|0|4|4|20220909";
// Split the string using "|" as a delimiter
auto items = split(input, "|");
// Because of the leading "|", the first string will be an empty string. Let's just get rid of it.
items.erase(items.begin());
// If string ends in a closing parenthesis, remove everything between parenthesis
// TBH, it's not clear what are the requirements for removing this
// (seeing as the "(DATA)" part of the first string is not removed as well),
// so this is what I came up with. If your requirements are different,
// you can just change the implementation of the lambda below.
std::transform(items.begin(), items.end(), items.begin(), [](std::string& s) {
if (*s.rbegin() == ')') {
s.erase(s.begin() + s.find_last_of('('), s.end());
}
return s;
});
// Trim spaces at start and end
std::transform(items.begin(), items.end(), items.begin(), trim_inplace);
// Print the result.
for (auto& item : items) {
std::cout << "'" << item << "'\n";
}