4

I'm using a regex to separate the fields of an HTTP request:

GET /index.asp?param1=hello&param2=128 HTTP/1.1

This way:

smatch m;
try 
{ 
    regex re1("(GET|POST) (.+) HTTP"); 
    regex_search(query, m, re1); 
} 
catch (regex_error e) 
{ 
    printf("Regex 1 Error: %d\n", e.code()); 
}
string method = m[1]; 
string path = m[2];

try 
{ 
    regex re2("/(.+)?\\?(.+)?"); 
    if (regex_search(path, m, re2)) 
    { 
        document = m[1]; 
        querystring = m[2];
    }
} 
catch (regex_error e) 
{ 
    printf("Regex 2 Error: %d\n", e.code()); 
}

Unfortunately this code works in MSVC but not with GCC 4.8.2 (which I have on Ubuntu Server 14.04). Can you suggest a different method of splitting that string using maybe normal std::string operators?

I don't know how to split the URL in different elements since the query string separator '?' may or may not be present in the string.

jww
  • 97,681
  • 90
  • 411
  • 885
Mark Miles
  • 706
  • 8
  • 20
  • You could consider using the boost regex library ( http://www.boost.org/doc/libs/1_57_0/libs/regex/doc/html/index.html ) – Christophe Feb 01 '15 at 22:01
  • Please give some more input. Your code cannot work if (what you mentioned) the query string separator is missing. So what are the inputs on both platforms? @Christophe: always pointing to boost may not always be a good hint. – St0fF Feb 01 '15 at 22:04
  • @St0fF Sorry, but his code can work: I cut and pasted it into MSVC2013 and got the expected results (method="GET", document="index.asp", querrystring="param1=hello&param2=128" ) – Christophe Feb 01 '15 at 22:17
  • @MarkMiles Could you please tell us what doesn't work ? I tested your code on ideone ( https://ideone.com/QZqQM1 ) and it also returned correct results (with gcc 4.9.2) – Christophe Feb 01 '15 at 22:20
  • As I wrote in my post: "Unfortunately this code works in MSVC but not with GCC 4.8.2". It must work on Ubuntu 14.04. If I do `#gcc --version` it says it's 4.8.2 but if I do `apt search gcc-4.9` I get `gcc-4.9-base/trusty,now 4.9-20140406-0ubuntu1 armhf [installed] GCC, the GNU Compiler Collection (base package)` so I don't know how to update my gcc. – Mark Miles Feb 01 '15 at 22:33
  • Also, I'd prefer not to use boost, thanks. – Mark Miles Feb 01 '15 at 22:34
  • http://stackoverflow.com/q/20027305/502399 – Tavian Barnes Feb 01 '15 at 23:08

3 Answers3

7

You might use std::istringstream to parse this:

int main()
{
    std::string request = "GET /index.asp?param1=hello&param2=128 HTTP/1.1";

    // separate the 3 main parts

    std::istringstream iss(request);

    std::string method;
    std::string query;
    std::string protocol;

    if(!(iss >> method >> query >> protocol))
    {
        std::cout << "ERROR: parsing request\n";
        return 1;
    }

    // reset the std::istringstream with the query string

    iss.clear();
    iss.str(query);

    std::string url;

    if(!std::getline(iss, url, '?')) // remove the URL part
    {
        std::cout << "ERROR: parsing request url\n";
        return 1;
    }

    // store query key/value pairs in a map
    std::map<std::string, std::string> params;

    std::string keyval, key, val;

    while(std::getline(iss, keyval, '&')) // split each term
    {
        std::istringstream iss(keyval);

        // split key/value pairs
        if(std::getline(std::getline(iss, key, '='), val))
            params[key] = val;
    }

    std::cout << "protocol: " << protocol << '\n';
    std::cout << "method  : " << method << '\n';
    std::cout << "url     : " << url << '\n';

    for(auto const& param: params)
        std::cout << "param   : " << param.first << " = " << param.second << '\n';
}

Output:

protocol: HTTP/1.1
method  : GET
url     : /index.asp
param   : param1 = hello
param   : param2 = 128
Galik
  • 47,303
  • 4
  • 80
  • 117
2

The reason why it's not working with gcc 4.8.2 is that regex_search is not implemented in stdlibc++. If you look inside regex.h here is what you get:

template<typename _Bi_iter, typename _Alloc,
    typename _Ch_type, typename _Rx_traits>
inline bool
regex_search(_Bi_iter __first, _Bi_iter __last,
        match_results<_Bi_iter, _Alloc>& __m,
        const basic_regex<_Ch_type, _Rx_traits>& __re,
        regex_constants::match_flag_type __flags
        = regex_constants::match_default)
{ return false; }

Use regex_match instead, which is implemented. You would have to modify your regex (eg, add .* before and after) as regex_match matches the entire string.

Alternatives:

  1. Upgrade to gcc 4.9
  2. Use boost::regex instead
  3. Switch to LLVM and libc++ (my preference).
1

If you want to avoid the use of regex, you can use standard string operations:

string query = "GET / index.asp ? param1 = hello&param2 = 128 HTTP / 1.1";
string method, path, document, querystring;

try {
    if (query.substr(0, 5) == "GET /")   // First check the method at the beginning 
        method = "GET";
    else if (query.substr(0, 6) == "POST /")
        method = "POST";
    else  throw std::exception("Regex 1 Error: no valid method or missing /");

    path = query.substr(method.length() + 2);  // take the rest, ignoring whitespace and slash
    size_t ph = path.find(" HTTP");     // find the end of the url
    if (ph == string::npos)            // if it's not found => error
        throw std::exception("Regex 2 Error: no HTTP version found");
    else path.resize(ph);             // otherwise get rid of the end of the string 

    size_t pq = path.find("?");      // look for the ? 
    if (pq == string::npos) {        // if it's absent, document is the whole string
        document = path;
        querystring = "";
    }
    else {                          // orherwie cut into 2 parts 
        document = path.substr(0, pq);
        querystring = path.substr(pq + 1);
    }

    cout << "method:     " << method << endl
        << "document:   " << document << endl
        << "querystring:" << querystring << endl;
}
catch (std::exception &e) {
    cout << e.what();
}

Of course, this code is not so nice than your original regex base one. So it's to be seen as a workaround if you cannot use an uptodate version of the compiler.

Christophe
  • 68,716
  • 7
  • 72
  • 138