3

I'm working in C++11, no Boost. I have a function that takes as input a std::string that contains a series of key-value pairs, delimited with semicolons, and returns an object constructed from the input. All keys are required, but may be in any order.

Here is an example input string:

Top=0;Bottom=6;Name=Foo;

Here's another:

Name=Bar;Bottom=20;Top=10;

There is a corresponding concrete struct:

struct S 
{
    const uint8_t top;
    const uint8_t bottom;
    const string name; 
}

I've implemented the function by repeatedly running a regular expression on the input string, once per member of S, and assigning the captured group of each to the relevant member of S, but this smells wrong. What's the best way to handle this sort of parsing?

Justin R.
  • 23,435
  • 23
  • 108
  • 157
  • What is the regular expression? And why does it seem wrong to you? – David Elson Nov 19 '14 at 00:49
  • 2
    tokenize on `;`, then tokenize on `=` – Red Alert Nov 19 '14 at 00:53
  • My first thought would probably to use find_first_of, but I don't think using a regex smells funny either. – Ryan Hartman Nov 19 '14 at 00:53
  • The regex is along the lines of "Top=([0-9]+)", etc. It seems like it might be wrong, as I'm repeatedly matching against the same string, which seems wasteful. Or maybe it's just fine! I guess I'm thinking that this sort of problem is frequently encountered and has a canonical solution. – Justin R. Nov 19 '14 at 00:55
  • I have a feeling that the answer might be "don't use regex". It just seems implied in the question that you want a regex solution. –  Nov 19 '14 at 00:58
  • Nothing wrong with it besides lower than optimal performance, a more efficient approach would be to parse the string once. Of course these strings will always be short so performance isn't an issue here. The way you do it results in faster code writing and barely takes enough time to be concerned about so I would follow your approach. – ShellFish Nov 19 '14 at 00:59
  • Nothing wrong with that. Could also do something like ([a-zA-Z]+)=([0-9]+); then repeatedly start the next search after the current match, the position of which can be ascertained from the match object. – David Elson Nov 19 '14 at 01:01

2 Answers2

4

For an easy readable solution, you can e.g. use std::regex_token_iterator and a sorted container to distinguish the attribute value pairs (alternatively use an unsorted container and std::sort).

std::regex r{R"([^;]+;)"};
std::set<std::string> tokens{std::sregex_token_iterator{std::begin(s), std::end(s), r}, std::sregex_token_iterator{}};

Now the attribute value strings are sorted lexicographically in the set tokens, i.e. the first is Bottom, then Name and last Top.

Lastly use a simple std::string::find and std::string::substr to extract the desired parts of the string.

Live example

Felix Glas
  • 15,065
  • 7
  • 53
  • 82
1

Do you care about performance or readability? If readability is good enough, then pick your favorite version of split from this question and away we go:

std::map<std::string, std::string> tag_map;

for (const std::string& tag : split(input, ';')) {
    auto key_val = split(input, '=');
    tag_map.insert(std::make_pair(key_val[0], key_val[1]));
}


S s{std::stoi(tag_map["top"]),
    std::stoi(tag_map["bottom"]),
    tag_map["name"]};
Community
  • 1
  • 1
Barry
  • 286,269
  • 29
  • 621
  • 977