-1
|(DATA)6S|3E6U22|London UK (2022-09)|.0007|10.8|11|1|0|4|4|20220909

I want to extract each value after | and assign each of them to a variable. But as for the third item, I want to extract only "London UK", from the third | to the first ( (without a space before ().

/\|([^)]+)\s

this is the closest I could get, but it catches |3E6U22|London UK, not London UK.

sshashank124
  • 31,495
  • 9
  • 67
  • 76
maynull
  • 1,936
  • 4
  • 26
  • 46
  • 1
    Why do you need "regex" for a simple task? – Soner from The Ottoman Empire Jan 04 '20 at 05:34
  • 3
    Perhaps this? https://regex101.com/r/8239aO/1 – Nick Jan 04 '20 at 06:31
  • 1
    @Nick you should post it as answer. – Soner from The Ottoman Empire Jan 04 '20 at 06:41
  • 1
    @snr There's not enough information in the question to be sure that it's what OP needs. If OP confirms that it works I will post an answer. – Nick Jan 04 '20 at 06:43
  • @yunnosch Thank you for the comment. I've solved the problem thanks to Nick. I voted his comment, but how can I confirm his comment? – maynull Jan 04 '20 at 13:27
  • @snr I'm new to C++ and I thought that regex would make my code more readable. If you can think of any simple ideas for this without using regex, please let me know. – maynull Jan 04 '20 at 13:33
  • 1
    Why not split on `|`? – Toto Jan 04 '20 at 13:34
  • By writing "I've solved the problem thanks to Nick." you confirmed their comment (it is not a technical thing like the "accept" button for answers). If you wait a little, @Nick will probably notice. But maybe, in the presence of other answers, they might not actually make an answer now. You could consider accepting a different answer (after waiting a little, to give nick a chance of answering in their favorite time zone). – Yunnosch Jan 04 '20 at 17:31
  • @Yunnosch Thank you - I appreciate your comments. OP has written up a good answer to the question (and got others) which is all that is required so I'm happy with the outcome. I don't mind missing out on a few rep points. – Nick Jan 04 '20 at 22:47
  • maynull You now get to pick one of the existing answers to accept. Or you can even write and accept your own answer, based on nicks comment (now that they basically stated not to claim it). That allows you to express which one was most helpful. – Yunnosch Jan 04 '20 at 23:04

3 Answers3

1

Regex is way too slow (and a bit overkill) for this. What you need is commonly known as splitting a string, and the algorithm to do it is quite simple. Here are some answers where you can find implementations for it:

Splitting a C++ std::string using tokens, e.g. “;”

How do I iterate over the words of a string?

Here's a simple implementation I wrote:

std::vector<std::string> split(std::string s, std::string delim) {
    std::vector<std::string> result;
    auto last_pos = 0;
    for (auto pos = s.find(delim);
              pos != std::string::npos;
              pos = s.find(delim, last_pos)) {

        result.emplace_back(s.begin()+last_pos, s.begin()+pos);
        last_pos = pos+delim.size();
    }
    result.emplace_back(s.begin()+last_pos, s.end());
    return result;
}

For the purposes of this answer, here's also an implementation of trim, which we use to remove spaces from the start and end of a string:

std::string& trim_inplace(std::string& s) {
    auto not_space = [](char c) {return c != ' ';};
    s.erase(s.begin(), std::find_if(s.begin(), s.end(), not_space));
    s.erase(std::find_if(s.rbegin(), s.rend(), not_space).base(), s.rbegin().base());
    return s;
}

Now that we got those out of the way, here's what you wanna do:

  • Split the string using | as a delimiter;
  • For each substring:
    • Remove any parts you don't want, if applicable
    • Trim the result

Or, in code:

std::string input = "|(DATA)6S|3E6U22|London UK (2022-09)|.0007|10.8|11|1|0|4|4|20220909";

// Split the string using "|" as a delimiter
auto items = split(input, "|");

// Because of the leading "|", the first string will be an empty string. Let's just get rid of it.
items.erase(items.begin());

// If string ends in a closing parenthesis, remove everything between parenthesis
// TBH, it's not clear what are the requirements for removing this
// (seeing as the "(DATA)" part of the first string is not removed as well),
// so this is what I came up with. If your requirements are different,
// you can just change the implementation of the lambda below.
std::transform(items.begin(), items.end(), items.begin(), [](std::string& s) {
    if (*s.rbegin() == ')') {
        s.erase(s.begin() + s.find_last_of('('), s.end());
    }
    return s;
});

// Trim spaces at start and end
std::transform(items.begin(), items.end(), items.begin(), trim_inplace);

// Print the result.
for (auto& item : items) {
    std::cout << "'" << item << "'\n";
}

Try it online!

Not a real meerkat
  • 5,604
  • 1
  • 24
  • 55
0

Thank you Nick! This is an example of the solution:

#include <string>
#include <regex>
#include <iostream>

int main()
{
    std::string newLine = "|(DATA)6S|3E6U22|London UK (2022-09)|.0007|10.8|11|1|0|4|4|20220909";
    std::regex reg(R"(\|(.[^(|]*)(?=\||\s\(|$))");

    std::sregex_iterator it(newLine.begin(), newLine.end(), reg);
    std::sregex_iterator end;


    while(it != end)
    {
        std::smatch m = *it;        
        std::cout << m.str(1) << std::endl;
        ++it;
    }

    return 0;
}

/* RESULT
(DATA)6S                                                                                                              
3E6U22                                                                                                                
London UK                                                                                                             
.0007                                                                                                                 
10.8                                                                                                                  
11                                                                                                                    
1                                                                                                                     
0                                                                                                                     
4                                                                                                                     
4                                                                                                                     
20220909 
*/
maynull
  • 1,936
  • 4
  • 26
  • 46
  • You're welcome. You can accept your own answer (once the time limit is over) and you should do that. – Nick Jan 04 '20 at 22:44
0

I think regex calculations are overwhelming for the case. I'd not prefer it.

Here is simply my two cents.

first character is assumed "|"
and characters in "str"

for i = 1 till null character
    if str[i] is not '|'
        print str[i]
    else
        print newline

If you want to store them in a container like 2D array, hold extra two counter, one of them for 2D array's first dimension, per word. The other is to start from 0 and end when hit '|' character. Having hit, start the second counter from zero again.

It would be done by using pointer but more exertion for you, even better using built-in split functions.