7

I want to write a function in C++ to replace C's sscanf that assigns the matches to iterator.

Basically, I want something like:

string s = "0.5 6 hello";
std::vector<boost::any> any_vector;
sscanv(s, "%f %i %s", any_vector);
cout << "float: " << any_cast<float>(any_vector[0]);
cout << "integer: " << any_cast<integer(any_vector[1]);
cout << "string: " << any_cast<string>(any_vector[2]);

The exact details may vary, but you get the idea. Any ideas for implementation?

Options so far along with problems so far:

  • std::istringstream: there's no manipulator for matching constant expressions
  • Boost.Regex: not sure if this will work and it seems much more complicated than necessary for this
  • Boost.Spirit: don't think this will work for dynamically generated format strings and it also seems more complicated then necessary
  • sscanf: it would work, but is non-standard, etc, and using it would require a lot of overhead since the number of arguments is determined at compile time
deuberger
  • 3,578
  • 6
  • 31
  • 33
  • Would you be open to using c++0x concepts? – Bradley Swain Feb 17 '11 at 22:34
  • @Bradley: I thought they were removed from the draft for this standard? – Xeo Feb 17 '11 at 22:42
  • @Bradley: definitely. I'm using gcc 4.5.1 and am compiling with -std=c++0x, so as long as it works with that I'm open to it. I don't want to wait for something that isn't yet available though. Of course, I'm still interested in hearing about it even if it's not available yet, but I need a solution for now. – deuberger Feb 17 '11 at 22:48
  • 1
    I guess variadic templates might be more convenient than vector (something like in http://en.wikipedia.org/wiki/C%2B%2B0x#Variadic_templates), although parsing the format string is still the biggest question. – UncleBens Feb 17 '11 at 22:55
  • @UncleBens: Thanks, I'll look into those, but you're right in that they won't solve my primary problem. – deuberger Feb 17 '11 at 23:02
  • You could write a manipulator that matches constant expressions perhaps. – Logan Capaldo Feb 17 '11 at 23:34
  • @Logan Capaldo: thanks, if I go that route, I'll do that, but I'd prefer something more intuitive and powerful. – deuberger Feb 17 '11 at 23:39
  • @deuberger my suggestion was going to be variadic templates as well. Something like sscanv(s, atuple) and suggest that maybe you could split the string and use boost::lexical_cast. – Bradley Swain Feb 18 '11 at 14:56
  • I think it should be made to work in a symmetrical fashion to `boost::format`, since `boost::format` is a type-safe replacement for `printf`. – Emile Cormier Feb 19 '11 at 18:43
  • @Bradley: variadic templates look awesome, but I'd prefer something more dynamic (i.e. format string can be given at run time). – deuberger Feb 21 '11 at 15:12
  • @Emile: I agree and will try to look into whether something like that could be added to boost (of course, it would probably be a while). – deuberger Feb 21 '11 at 15:15

2 Answers2

2

If your format string is determined at compile time, there are some variadic-template printf replacements written. Inverting those should work reasonably well.

You could then use istream's >> operator for reading, or the c-stdlib functions.

Macke
  • 24,812
  • 7
  • 82
  • 118
  • That may be the best solution, but ideally I'd prefer something that could work on dynamic format strings (i.e. loaded at runtime), but I think this would only work if the format was specified at compile time, which is essentially the same limitation of sscanf. – deuberger Feb 17 '11 at 23:43
  • I see. Well, that shouldn't be too hard either. If your parsing needs are basic, I don't see why you can't use istream to read int/float/string out of the buffer, then stuff them in your any-vector (or store it in an output iterator, if you want to be fancy). – Macke Feb 17 '11 at 23:46
  • OTOH, if your format string is dynamic, the types and length of your vector will be dynamic too, so you'll have to be very generic with what you handle there, but perhaps your use-case and other code matches that. (I'm a bit curious on what you're trying to do... :) – Macke Feb 17 '11 at 23:49
  • Id say my parsing needs are more than basic but less than advanced. I basically want to replicate the functionality of sscanf, but make it more dynamic. I may end up using the istream option if no one has a better recommendation. Thanks. – deuberger Feb 17 '11 at 23:51
  • Oops, didn't see you last comment... yes my vector will have to be dynamic as well. The use case necessitates it (at least to get a generic solution). I plan on using Boost.Any or maybe making my own version based on that concept that fits my needs a little better. In short, there are a lot of "text files" of various formats that I want to parse and from which I want to create trend data. I'd prefer to write and compile once and then use configuration files to add support for new file formats rather than always having to add more code. – deuberger Feb 17 '11 at 23:56
2

What's about that?

void sscanf(std::string str,
            const std::string& format,
            std::vector<boost::any>& result)
{
  std::string::const_iterator i = format.begin();
  while (i != format.end())
  {
    if (*i == '%')
    {
      ++i; // now *i is the conversion specifier
      char specifier = *i;

      ++i; // now *i is the next seperator
      std::string extract = str.substr(0, str.find(*i));

      switch (specifier) 
      {
        // matching an integer
        case 'i':
          result.push_back(boost::lexical_cast<int>(extract));
          break;
        // matching a floating point number
        case 'a': case 'e': case 'f': case 'g':
          result.push_back(boost::lexical_cast<float>(extract));
          break;
        // matching a single character
        case 'c':
          result.push_back(boost::lexical_cast<char>(extract));
          break;
        // matching a string
        case 's':
          result.push_back(extract);
          break;
        // Invalid conversion specifier, throwing an exception
        default:
          throw std::runtime_error("invalid conversion specifier");
          break;
      }
    }
    else
    {
      // if it's not a %, eat!
      str.erase(0, str.find(*i)+1);
      ++i;
    }
  }
}

Some conversions specifiers are missing – but principally it works.

Karl von Moor
  • 8,484
  • 4
  • 40
  • 52