C++ getline - Extracting a substring using regex

Question

I have a file with contents like this -

Random text
+-------------------+------+-------+-----------+-------+
|     Data          |   A  |   B   |     C     |   D   |
+-------------------+------+-------+-----------+-------+
|   Data 1          | 1403 |     0 |      2520 | 55.67 |
|   Data 2          | 1365 |     2 |      2520 | 54.17 |
|   Data 3          |    1 |     3 |      1234 | 43.12 |
Some more random text

I want to extract the value of column D of row Data 1 i.e. I want to extract the value 55.67 from the example above. I am parsing this file line by line using getline -

while(getline(inputFile1,line)) {
    if(line.find("|  Data 1") != string::npos) {
        subString = //extract the desired value
}

How can I extract the desired sub string from the line. Is there any way using boost::regex that I can extract this substring?

I would just filter |. After the fifth | in a line you desired value. — HWilmer, Feb 22 '20 at 09:52
First thing the .find() works if you just say "Data 1". you don't have to put all spaces. — Alxbrla, Feb 22 '20 at 09:55
To get it You already know how much char you have for the last '|' before 55.67 o just use .substring(position of the lest '|'); and it will take from that to the end. after that you just throw all spaces! (take a look to substring man) — Alxbrla, Feb 22 '20 at 09:59

Ted Lyngmo · Accepted Answer · 2020-02-22T10:20:44.590

2

While regex may have its uses, it's probably overkill for this.

Bring in a trim function and:

char delim;
std::string line, data;
int a, b, c;
double d;

while(std::getline(inputFile1, line)) {
    std::istringstream is(line);
    if( std::getline(is >> delim, data, '|') >>
        a >> delim >> b >> delim >> c >> delim >> d >> delim) 
    {
        trim(data);

        if(data == "Data 1") {
            std::cout << a << ' ' << b << ' ' << c << ' ' << d << '\n';
        }
    }
}

Demo

edited Feb 22 '20 at 10:20

answered Feb 22 '20 at 09:59

Ted Lyngmo

93,841
5
60
108

Thanks. This is quite a simple solution. Regex would make it quite complicated. – Harshu Feb 24 '20 at 07:19

A M · Answer 2 · 2020-02-22T15:01:40.783

Yes, it is easily possible to extract your substring with a regex. There is no need to use boost, you can also use the existing C++ regex library.

The resulting program is ultra simple.

We read all lines of the source file in a simple for loop. Then we use std::regex_match to match a just read line against our regex. If we have found a match, then the result will be in the std::smatch sm, group 1.

And because we will design the regex for finding double values, we will get exactly what we need, without any additional spaces.

This we can convert to a double and show the result on the screen. And because we defined the regex to find a double, we can be sure that std::stod will work.

The resulting program is rather straightforward and easy to understand:

#include <iostream>
#include <string>
#include <sstream>
#include <regex>

// Please note. For std::getline, it does not matter, if we read from a
// std::istringstream or a std::ifstream. Both are std::istream's. And because
// we do not have files here on SO, we will use an istringstream as data source.
// If you want to read from a file later, simply create an std::ifstream inputFile1

// Source File with all data
std::istringstream inputFile1{ R"(
Random text
+-------------------+------+-------+-----------+-------+
|     Data          |   A  |   B   |     C     |   D   |
+-------------------+------+-------+-----------+-------+
|   Data 1          | 1403 |     0 |      2520 | 55.67 |
|   Data 2          | 1365 |     2 |      2520 | 54.17 |
|   Data 3          |    1 |     3 |      1234 | 43.12 |
Some more random text)" 
};

// Regex for finding the desired data
const std::regex re(R"(\|\s+Data 1\s+\|.*?\|.*?\|.*?\|\s*([-+]?[0-9]*\.?[0-9]+)\s*\|)");

int main() {

    // The result will be in here
    std::smatch sm;

    // Read all lines of the source file
    for (std::string line{}; std::getline(inputFile1, line);) {

        // If we found our matching string
        if (std::regex_match(line, sm, re)) {

            // Then extract the column D info
            double data1D = std::stod(sm[1]);

            // And show it to the user.
            std::cout << data1D << "\n";
        }
    }
}

For most people the tricky part is how to define the regular expression. There are pages like Online regex tester and debugger. There is also a breakdown for the regex and a understandable explanation.

For our regex

\|\s+Data 1\s+\|.*?\|.*?\|.*?\|\s*([-+]?[0-9]*\.?[0-9]+)\s*\|

we get the following explanation:

\|  
    matches the character | literally (case sensitive)
\s+
    matches any whitespace character (equal to [\r\n\t\f\v ])
    + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
    Data 1 matches the characters Data 1 literally (case sensitive)
\s+
    matches any whitespace character (equal to [\r\n\t\f\v ])
    + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\| 
    matches the character | literally (case sensitive)
.*?
    matches any character (except for line terminators)
    *? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\| 
    matches the character | literally (case sensitive)
.*?
    matches any character (except for line terminators)
    *? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\| 
    matches the character | literally (case sensitive)
.*?
    matches any character (except for line terminators)
\| 
    matches the character | literally (case sensitive)
\s*
    matches any whitespace character (equal to [\r\n\t\f\v ])

1st Capturing Group ([-+]?[0-9]*\.?[0-9]+)

\s*
    matches any whitespace character (equal to [\r\n\t\f\v ])
\| 
    matches the character | literally (case sensitive)

By the way, a more safe (more secure matching) regex would be:

\|\s+Data 1\s+\|\s*?\d+\s*?\|\s*?\d+\s*?\|\s*?\d+\s*?\|\s*([-+]?[0-9]*\.?[0-9]+)\s*\|

C++ getline - Extracting a substring using regex

2 Answers2