0

This is a question about what I'm doing wrong with regex_match. It seems that escaped characters such as +, . and \d (the plus symbol, the decimal symbol and any digit) are not matching. Also, only greedy match seems to work (so no '?').

I've a program written on Ubuntu 18.04 and build with g++ (gcc 7.5.0). The goal is to parse the temperature out of output from the sensor utility. Below is a truncated example that illustrates my issue. Why isn't temp parsing?

#include <iostream>
#include <stdexcept>
#include <regex>

char sensorData[][250]={
"radeon-pci-0008\n",
"Adapter: PCI adapter\n",
"temp1:        +47.0°C  (crit = +120.0°C, hyst = +90.0°C)\n",
"\n",
"fam15h_power-pci-00c4\n",
"Adapter: PCI adapter\n",
"power1:        7.79 W  (interval =   0.01 s, crit =  15.05 W)\n",
"\n",
"k10temp-pci-00c3\n",
"Adapter: PCI adapter\n",
"temp1:        +47.1°C  (high = +70.0°C)\n",
"                       (crit = +105.0°C, hyst = +104.0°C);\n"};

float extractTemperature(std::string sensorData){
        //std::regex e("\\+(.)*?C");  //nope
        //std::regex e("\\d\\d(.)*?C"); //nope
        //std::regex e("temp1(.)*C"); //yes all day long
        std::regex e("\\+(\\d\\d\\.\\d)(.)*?C"); //nope
        std::smatch match;
        float temperature = 0.0;

        std::cout << "evaluating: " << sensorData;
        if (sensorData.length()>1){
                //something about \n confuses regex. Strip last char
                sensorData = sensorData.substr(0,sensorData.length()-1);
                if(std::regex_match(sensorData,match,e)){
                        for (unsigned i=0; i<match.size(); ++i){
                                std::cout<<"["<<match[i]<<"]";
                        }
                } //else std::cout << "no match";
        }//else: only one char... meh, skipping

        //assumes our regex parses out a /d/d/./d
        try{
                float temperature = std::stof(sensorData);
        }catch(...){};  
        return temperature;
}

int main(int argc, char *argv[])
{
        for (int line = 0; line < 12; line++){
                std::cout << "temp extracted is: " << extractTemperature(std::string(sensorData[line])) << std::endl;
        }
        return 0;
}

When I try my expression at regex101.com I can see that the expression:

\+(\d\d\.\d)(.)*?C

matches the string:

"temp1:        +45.0°C  (crit = +120.0°C, hyst = +90.0°C"

and yields two groups, one is the temperature "45.0"! Fantastic - but I can't replicate this in C++

if I compile trivially with:

g++ 1.cc -o extractTemp

The output is:

temp extracted is: evaluating: radeon-pci-0008
0
temp extracted is: evaluating: Adapter: PCI adapter
0
temp extracted is: evaluating: temp1:        +47.0°C  (crit = +120.0°C, hyst = +90.0°C)
0
temp extracted is: evaluating: 
0
temp extracted is: evaluating: fam15h_power-pci-00c4
0
temp extracted is: evaluating: Adapter: PCI adapter
0
temp extracted is: evaluating: power1:        7.79 W  (interval =   0.01 s, crit =  15.05 W)
0
temp extracted is: evaluating: 
0
temp extracted is: evaluating: k10temp-pci-00c3
0
temp extracted is: evaluating: Adapter: PCI adapter
0
temp extracted is: evaluating: temp1:        +47.1°C  (high = +70.0°C)
0
temp extracted is: evaluating:                        (crit = +105.0°C, hyst = +104.0°C);
0

So, why isn't my temperature parsed on the "temp1:" lines?

Note: I don't think the special chars need single or triple escaping (eg \d or \\d), that just causes the compiler to complain of an unrecognized character

rawrex
  • 4,044
  • 2
  • 8
  • 24
Quinn Carver
  • 587
  • 7
  • 14

1 Answers1

2

You should use a raw string:

Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. - cppreference.com

In your case it will be as the following:

// ... 
std::regex expression(R"(\d+\.\d+°C)");
// ...

Overall, there were some overcomplications (the explicit \n managing, looping over match) and bugs (re-declaration of the temperature) in your code. So, you should aim toward the following form instead:

double extractTemperature(std::string sensorData){
    std::regex expression(R"(\d+\.\d+°C)");
    std::smatch match;
    double temperature = 0;
    if (sensorData.size()>1) {
        std::regex_search(sensorData, match, expression);
        if (match.size() > 0) {
            std::cout << '[' << match[0] << ']' << '\n';
            std::string number = match[0];
            try { temperature = std::stof(number); }
            catch (...) {
                std::cout << "Can't convert: " << number << '\n';
            }
            return temperature;
        }
    }
    return 0; // Default return, if no sensorData
}

int main() {
    for (std::size_t line = 0; line < 12; ++line){
        double value = extractTemperature(std::string(sensorData[line]));
        if (value)
            std::cout << "Extracted: " << value << '\n';
    }
}

Example output:

[47.0°C]
Extracted: 47
[47.1°C]
Extracted: 47.1
[105.0°C]
Extracted: 105
rawrex
  • 4,044
  • 2
  • 8
  • 24
  • TYVM! I see that you are using regex_search rather than regex_match. Any rules of thumb here? – Quinn Carver Jun 30 '21 at 13:38
  • 1
    @QuinnCarver you're welcome! Yeah, `regex_match` acts as a predicate, entire line has to match (*"contains or not?"*), while `regex_search` will look through the line, searching for specified pattern. – rawrex Jun 30 '21 at 13:41
  • @QuinnCarver just in case, it was covered in this [answer](https://stackoverflow.com/a/26696318/14624729). – rawrex Jun 30 '21 at 13:47