0

I have a program to parse HTTP digest's components like this:

#include "stdafx.h"
#include <iostream>
#include <string>
#include <regex>
#include <unordered_map>

int main()
{
    std::string nsInput = R"(Digest realm = "http-auth@example.org",
        qop= " auth, auth-int ", algorithm = MD5 ,
        nonce ="7ypf/xlj9XXwfDPEoM4URrv/xwf94BcCAzFZH4GiTo0v"    ,
        opaque="FQhe/qaU925kfnzjCev0ciny7QMkPqMAFRtzCUYo5tdS"
    )";
    //  Spaces are inserted into some places of the input intentionally

    std::smatch mat_opt, mat_val;
    std::unordered_map<std::string, std::string> mapDigest;

    try {
        std::regex rex_opt(R"(\s*([A-Za-z]{3,})\s*=)");
        std::regex rex_val(R"(\s*\"\s*(.{3,})\s*\"|\s*(.{3,})\s*,)");

        auto& str = nsInput;
        while (std::regex_search(nsInput, mat_opt, rex_opt))
        {
            if (mat_opt.size() >= 2) {
                auto& field = mat_opt[1].str();
                std::string& next = mat_opt.suffix().str();

                if (std::regex_search(next, mat_val, rex_val) && mat_val.size() >= 2) {
                    auto& value = mat_val[1].str();
                    mapDigest[field] = value;
                }

                str = mat_opt.suffix().str();
            }
        }

        for (auto& itr : mapDigest) {
            std::cout << itr.first << ":" << itr.second << ".\n";
        }
    }
    catch (std::regex_error& e) {
        std::cout << "regex_search failed" << e.what() << "\n";
    }

    return 0;
}

The output:

nonce:7ypf/xlj9XXwfDPEoM4URrv/xwf94BcCAzFZH4GiTo0v.
realm:http-auth@example.org.
qop:auth, auth-int .
algorithm:.
opaque:FQhe/qaU925kfnzjCev0ciny7QMkPqMAFRtzCUYo5tdS.

What I am trying to solve are:

1) The spaces are still appeared at the end of "qop"'s value.

2) The value of "algorithm" can't be matched.

May someone shine the obscure cause and how to fix it?

Thanks

SteveH
  • 233
  • 5
  • 15
  • 2
    Why do you insist on a regular expression? – Jonathon Reinhart Feb 02 '18 at 02:15
  • `algorithm = MD5` not `algorithm = "MD5"`, but you pattern match `\"` – Brett7533 Feb 02 '18 at 02:16
  • 2
    A regular expression is not how you go about parsing HTTP. – Sam Varshavchik Feb 02 '18 at 02:21
  • @Jonathon, Sam I ever tried with sscanf, stringstream and spirit but found that regex might be easier to do the job. If you know which way is easier and more elegant than regex, please tell me. – SteveH Feb 02 '18 at 04:33
  • @Brett You look at the pattern again to see the alternation separated by '|' operator to find value without quotes. According RFC-7616 the algorithm field does not surrounded by quotes. – SteveH Feb 02 '18 at 04:39

2 Answers2

1

First, your code cannot compile because you are trying to bind a non-const lvalue reference to a temporary object in the following lines:

// ...
auto& field = mat_opt[1].str();
// ...
std::string& next = mat_opt.suffix().str();
// ...
auto& value = mat_val[1].str();
// ...

I recommend to remove the reference, and use auto or std::string instead. Because of RVO, it has little performance loss.

To remove the spaces at the end of the values, you can use .{3,}? instead of .{3,} in your regex pattern. .{3,} without ? will match greedily, thus will match all characters followed (including white spaces).

The string MD5 is matched by the second parenthesis in your regex pattern, so you should access it by mat_val[2] instead of mat_val[1]. You can use conditional expression as follows:

auto value = mat_val[1].matched ? mat_val[1].str() : mat_val[2].str();

BTW, since you are using raw string literal, you needn't write an extra \ before the character " in your regex pattern.

xskxzr
  • 12,442
  • 12
  • 37
  • 77
  • Thanks for your tips. It works like a charm now. One thing is the code I posted is compiled well by VS 2017, anyway it 'd better by removing references. – SteveH Feb 02 '18 at 06:47
1

As stated by others, regex may not be the weapon of choice to parse HTTP digests.

Nevertheless, I found the pattern challenging. What makes it harder than necessary is the fact that you have separators in quotes that should be ignored (in the qop-part). Your other problems stem from greedy matches (e.g. the {3,}-part).

Anyway, this is what I got after 15 minutes:

=\s*((?:[^,"]|"\s*([^"]*?)\s?")+?)(?=\s*,|$)

Demo

Update: I went the extra mile - just to prove my point.

#include <iostream>
#include <string>
#include <regex>
#include <unordered_map>

int main()
{
    std::string nsInput = R"(Digest realm = "http-auth@example.org",
        qop= " auth, auth-int ", algorithm = MD5 ,
        nonce ="7ypf/xlj9XXwfDPEoM4URrv/xwf94BcCAzFZH4GiTo0v"    ,
        opaque="FQhe/qaU925kfnzjCev0ciny7QMkPqMAFRtzCUYo5tdS"
    )";
    //  Spaces are inserted into some places of the input intentionally

    std::smatch mat_opt, mat_val;
    std::unordered_map<std::string, std::string> mapDigest;

    try {
        std::regex rex_opt(R"(\s*([A-Za-z]{3,})\s*=)");
        std::regex rex_val("=\\s*((?:[^,\"]|\"\\s*([^\"]*?)\\s?\")+?)(?=\\s*,|$)");

        auto& str = nsInput;
        while (std::regex_search(nsInput, mat_opt, rex_opt))
        {
            if (mat_opt.size() >= 2) {
                auto field = mat_opt[1].str();

                if (std::regex_search(nsInput, mat_val, rex_val)) {
                    auto value = mat_val[2].matched ? mat_val[2].str() : mat_val[1].str();
                    mapDigest[field] = value;
                }

                str = mat_opt.suffix().str();
            }
        }

        for (auto& itr : mapDigest) {
            std::cout << itr.first << ":" << itr.second << ".\n";
        }
    }
    catch (std::regex_error& e) {
        std::cout << "regex_search failed" << e.what() << "\n";
    }

    return 0;
}

Output:

opaque:FQhe/qaU925kfnzjCev0ciny7QMkPqMAFRtzCUYo5tdS.
nonce:7ypf/xlj9XXwfDPEoM4URrv/xwf94BcCAzFZH4GiTo0v.
algorithm:MD5.
realm:http-auth@example.org.
qop:auth, auth-int.
wp78de
  • 18,207
  • 7
  • 43
  • 71
  • I forgot: The values you are looking for are always in the last group of a match. So, usually the 3rd group, only for MD5 it's the 2nd. – wp78de Feb 02 '18 at 06:20
  • Thanks for your answer, unfortunately your pattern does not work. – SteveH Feb 02 '18 at 06:54
  • It does. Have you clicked the demo link? – wp78de Feb 02 '18 at 06:55
  • Yes, the demo seems work but only the values are matched, it does not take into account the fields' name. I copied your pattern to my source, MSVC gives wrong output. – SteveH Feb 02 '18 at 07:13
  • Yes, It works. I realized that you have changed some points in my code to adapt to your pattern otherwise the output can't be correct. Thanks for your effort. – SteveH Feb 02 '18 at 09:59
  • You are allowed to make code changes as needed. StackOverflow is not a free coding service. – wp78de Feb 02 '18 at 16:43