17

I'm wondering if there is a library like Boost Format, but which supports named parameters rather than positional ones. This is a common idiom in e.g. Python, where you have a context to format strings with that may or may not use all available arguments, e.g.

mouse_state = {}
mouse_state['button'] = 0
mouse_state['x'] = 50
mouse_state['y'] = 30

#...

"You clicked %(button)s at %(x)d,%(y)d." % mouse_state
"Targeting %(x)d, %(y)d." % mouse_state

Are there any libraries that offer the functionality of those last two lines? I would expect it to offer a API something like:

PrintFMap(string format, map<string, string> args);

In Googling I have found many libraries offering variations of positional parameters, but none that support named ones. Ideally the library has few dependencies so I can drop it easily into my code. C++ won't be quite as idiomatic for collecting named arguments, but probably someone out there has thought more about it than me.

Performance is important, in particular I'd like to keep memory allocations down (always tricky in C++), since this may be run on devices without virtual memory. But having even a slow one to start from will probably be faster than writing it from scratch myself.

BCS
  • 75,627
  • 68
  • 187
  • 294
  • 1
    Boost does have a library that makes the first line possible. But I'm going to go out on a limb and say the second two are simply not possible without some serious pre-processor trickery. – Dennis Zickefoose Sep 11 '10 at 18:39
  • 1
    That was just an example in Python of what I'm trying to do, I don't expect that syntax in C++. And, I'm pretty sure I don't really need boost to make a map. ;) –  Sep 11 '10 at 18:44
  • Even if you're loose with the syntax, you won't be able to extract member names out of strings at run-time the way you want. C++ simply doesn't support it. And you *do* need Boost to use named parameters to function calls. – Dennis Zickefoose Sep 11 '10 at 18:57
  • the second is easily done by using `map>` i think – Johannes Schaub - litb Sep 11 '10 at 19:07
  • 1
    I suppose if you're dealing with maps, and not structures, it would be possible, although I'm not familiar with any existing libraries that do so. – Dennis Zickefoose Sep 11 '10 at 19:10
  • 1
    Your requirement makes little sense: Either you can have this syntactic sugar which will -- compared to printf and friends, and probably even to Boost.Format -- result in significant memory and processing overhead. Or you can have performance. I don't think both are possible. – Martin Ba Sep 12 '10 at 16:34
  • 1
    I don't see how named replacements are syntactic sugar - it's like saying a map is syntactic sugar for an array of key/value pairs. Yes, on some level they're equivalent, but the _semantics_ of one is useful in many cases that the other is not. I also don't know why everyone is focusing on the question of syntax rather than whether or not any library out there does anything at all like this. –  Sep 12 '10 at 18:26

6 Answers6

12

The fmt library supports named arguments:

print("You clicked {button} at {x},{y}.",
      arg("button", "b1"), arg("x", 50), arg("y", 30));

And as a syntactic sugar you can even (ab)use user-defined literals to pass arguments:

print("You clicked {button} at {x},{y}.",
      "button"_a="b1", "x"_a=50, "y"_a=30);

For brevity the namespace fmt is omitted in the above examples.

Disclaimer: I'm the author of this library.

vitaut
  • 49,672
  • 25
  • 199
  • 336
  • It is not quite the same. There is not enough to just name a parameter, the type format is required too. For example, by the type formatting i mean what type should be applied, int/float/string/etc, how many zeros forwarded before an integer/float type parameter, how many characters a parameter has to consume from the place and how to align, and so on. – Andry Oct 12 '18 at 13:32
  • 1
    @Andry all usual format specifiers can be applied to named arguments. – vitaut Oct 12 '18 at 14:01
  • That would be a half solution. All formating has to be done at once in single format string, like what: `formatme("%{abc:02u}", FormatDic(abcvar, "abc", defaultabcvalue))`. The reason is simple is what, the format string could be stored/changed separately from the real input parameters. – Andry Oct 12 '18 at 14:05
  • Another reason, is optionality of parameters to the right from the format string, to remove requirement from the end user to pass all the parameters the format string has, because format string can be taken, for example, from a config file. For the sake of optionality we have to remove a requirement of the order for the parameters list in a calling function and so make them totally named. – Andry Oct 12 '18 at 14:15
  • 1
    To clarify: you can do `print("{abc:02u}", "abc"_a=value)` and it will work as expected. The format string can be stored elsewhere. – vitaut Oct 12 '18 at 15:53
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181768/discussion-between-andry-and-vitaut). – Andry Oct 12 '18 at 16:02
  • @vitaut you should mention your dynamic_format_arg_store class. – kervin Nov 08 '20 at 22:22
7

I've always been critic with C++ I/O (especially formatting) because in my opinion is a step backward in respect to C. Formats needs to be dynamic, and makes perfect sense for example to load them from an external resource as a file or a parameter.

I've never tried before however to actually implement an alternative and your question made me making an attempt investing some weekend hours on this idea.

Sure the problem was more complex than I thought (for example just the integer formatting routine is 200+ lines), but I think that this approach (dynamic format strings) is more usable.

You can download my experiment from this link (it's just a .h file) and a test program from this link (test is probably not the correct term, I used it just to see if I was able to compile).

The following is an example

#include "format.h"
#include <iostream>

using format::FormatString;
using format::FormatDict;

int main()
{
    std::cout << FormatString("The answer is %{x}") % FormatDict()("x", 42);
    return 0;
}

It is different from boost.format approach because uses named parameters and because the format string and format dictionary are meant to be built separately (and for example passed around). Also I think that formatting options should be part of the string (like printf) and not in the code.

FormatDict uses a trick for keeping the syntax reasonable:

FormatDict fd;
fd("x", 12)
  ("y", 3.141592654)
  ("z", "A string");

FormatString is instead just parsed from a const std::string& (I decided to preparse format strings but a slower but probably acceptable approach would be just passing the string and reparsing it each time).

The formatting can be extended for user defined types by specializing a conversion function template; for example

struct P2d
{
    int x, y;
    P2d(int x, int y)
        : x(x), y(y)
    {
    }
};

namespace format {
    template<>
    std::string toString<P2d>(const P2d& p, const std::string& parms)
    {
        return FormatString("P2d(%{x}; %{y})") % FormatDict()
            ("x", p.x)
            ("y", p.y);
    }
}

after that a P2d instance can be simply placed in a formatting dictionary.

Also it's possible to pass parameters to a formatting function by placing them between % and {.

For now I only implemented an integer formatting specialization that supports

  1. Fixed size with left/right/center alignment
  2. Custom filling char
  3. Generic base (2-36), lower or uppercase
  4. Digit separator (with both custom char and count)
  5. Overflow char
  6. Sign display

I've also added some shortcuts for common cases, for example

"%08x{hexdata}"

is an hex number with 8 digits padded with '0's.

"%026/2,8:{bindata}"

is a 24-bit binary number (as required by "/2") with digit separator ":" every 8 bits (as required by ",8:").

Note that the code is just an idea, and for example for now I just prevented copies when probably it's reasonable to allow storing both format strings and dictionaries (for dictionaries it's however important to give the ability to avoid copying an object just because it needs to be added to a FormatDict, and while IMO this is possible it's also something that raises non-trivial problems about lifetimes).

UPDATE

I've made a few changes to the initial approach:

  1. Format strings can now be copied
  2. Formatting for custom types is done using template classes instead of functions (this allows partial specialization)
  3. I've added a formatter for sequences (two iterators). Syntax is still crude.

I've created a github project for it, with boost licensing.

John
  • 15,418
  • 12
  • 44
  • 65
6502
  • 112,025
  • 15
  • 165
  • 265
  • As a suggestion; the code as described is close to, but not quite compatible with Python formatting (s/()/{}/ putting the `x` before the name etc.). With a little re working of the parser, you could probably get the major cases to work the same in both languages. – BCS Apr 04 '11 at 04:46
  • I didn't invest a lot of time thinking about the syntax for formatters, for example now I've added sequences but I don't like how I ended up finding the nested format string needed (e.g. to get a comma-space separated list of values the syntax is `"%*/, {L}"` where `*` is replaced with `{x}`). – 6502 Apr 05 '11 at 17:04
2

The answer appears to be, no, there is not a C++ library that does this, and C++ programmers apparently do not even see the need for one, based on the comments I have received. I will have to write my own yet again.

  • Actually, I voted the question up because I find it interesting. I have written some kind of formatter taking a context (map) as argument, but the need was vastly different: I wanted to choose between different possible generated outputs rather than precisely controlling the formatting of numbers, padding, length, etc... I don't think the jump would be too important from the Boost.Format library... but the question is: do you want to try and read boost files ;) ? – Matthieu M. Sep 12 '10 at 18:40
  • 2
    This is not true any more =). There is a C++ library that does this: https://github.com/cppformat/cppformat – vitaut Oct 28 '15 at 15:47
1

Well I'll add my own answer as well, not that I know (or have coded) such a library, but to answer to the "keep the memory allocation down" bit.

As always I can envision some kind of speed / memory trade-off.

On the one hand, you can parse "Just In Time":

class Formater:
  def __init__(self, format): self._string = format

  def compute(self):
    for k,v in context:
      while self.__contains(k):
        left, variable, right = self.__extract(k)
        self._string = left + self.__replace(variable, v) + right

This way you don't keep a "parsed" structure at hand, and hopefully most of the time you'll just insert the new data in place (unlike Python, C++ strings are not immutable).

However it's far from being efficient...

On the other hand, you can build a fully constructed tree representing the parsed format. You will have several classes like: Constant, String, Integer, Real, etc... and probably some subclasses / decorators as well for the formatting itself.

I think however than the most efficient approach would be to have some kind of a mix of the two.

  • explode the format string into a list of Constant, Variable
  • index the variables in another structure (a hash table with open-addressing would do nicely, or something akin to Loki::AssocVector).

There you are: you're done with only 2 dynamically allocated arrays (basically). If you want to allow a same key to be repeated multiple times, simply use a std::vector<size_t> as a value of the index: good implementations should not allocate any memory dynamically for small sized vectors (VC++ 2010 doesn't for less than 16 bytes worth of data).

When evaluating the context itself, look up the instances. You then parse the formatter "just in time", check it agaisnt the current type of the value with which to replace it, and process the format.

Pros and cons: - Just In Time: you scan the string again and again - One Parse: requires a lot of dedicated classes, possibly many allocations, but the format is validated on input. Like Boost it may be reused. - Mix: more efficient, especially if you don't replace some values (allow some kind of "null" value), but delaying the parsing of the format delays the reporting of errors.

Personally I would go for the One Parse scheme, trying to keep the allocations down using boost::variant and the Strategy Pattern as much I could.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • In practice, on many platforms, you will find that the cost of the heap allocation for any kind of vector will be much larger than the "far from efficient" solution, which has major cache advantages. And on platforms without virtual memory, even fast allocations cause slow death by fragmentation. –  Sep 12 '10 at 19:10
  • @Joe: yes the "far from efficient" was a bit much. But it depends heavily on the kind of format. If there are 1 or 2 replacements in a 25 chars strings, it will be efficient; if there are a few dozen of occurrences of each variable in a few kilobytes of text, it'll slow down. That's always the issue with efficiency: small inputs are affected by constants while large inputs are affected by the big O :/ – Matthieu M. Sep 13 '10 at 08:07
0

I've writen a library for this puporse, check it out on GitHub.

Contributions are wellcome.

Garcia Sylvain
  • 356
  • 4
  • 10
0

Given that Python it's self is written in C and that formatting is such a commonly used feature, you might be able (ignoring copy write issues) to rip the relevant code from the python interpreter and port it to use STL maps rather than Pythons native dicts.

BCS
  • 75,627
  • 68
  • 187
  • 294