18

I've been given the task of polishing the interface of a codec library. We're using C++17, and I can only use the standard library (i.e. no Boost). Currently, there's a Decoder class that looks roughly like this:

class Decoder : public Codec {

public:

    struct Result {
        vector<uint8_t>::const_iterator new_buffer_begin;
        optional<Metadata>              metadata;
        optional<Packet>                packet;
    };

    Result decode(vector<uint8_t>::const_iterator buffer_begin,
                  vector<uint8_t>::const_iterator buffer_end);

private:
    // irrelevant details
};

The caller instantiates a Decoder, then feeds a stream of data to the decoder by

  1. Reading a chunk of data from a file (but there could be other sources in the future), and appending it to a vector<uint8_t>.

  2. Calling the decode function, passing the iterators for their vector.

  3. If the returned Result's new_buffer_begin is identical to the buffer_begin that was passed to decode, that means there wasn't enough data in the buffer to decode anything, and the caller should go back to step 1. Otherwise, the caller consumes the Metadata or Packet object that was decoded, and goes back to step 2, using new_buffer_begin for the next pass.

The things I dislike about this interface and need help improving:

  • Using vector<uint8_t>::const_iterator seems overly specific. Is there a more generic approach that doesn't force the caller to use vector? I was considering just using C-style interface; a uint8_t * and a length. Is there a C++ alternative that's fairly generic?

  • If there was enough data to decode something, only metadata or packet will have a value. I think std::variant or 2 callbacks (one for each type) would make this code more self-documenting. I'm not sure which is more idiomatic though. What are the pros and cons of each, and is there an even better approach?

splicer
  • 5,344
  • 4
  • 42
  • 47
  • 2
    `Is there a C++ alternative that's fairly generic?` Templates. – tkausl Apr 13 '19 at 22:14
  • `typedef vector::const_iterator it_t;` or `using it_t= vector::const_iterator;` will make it cleaner. – Mirko Apr 13 '19 at 22:18
  • 1
    I like the callback approach, passing a consumer object with a callback for each kind of result produced. When the method return you give the guaranty that at most one callback has been called. But you could also have an async variant. The API could evolve by adding more callback to the consumer. std::variant is also good but may require the user to check which one is available (doesn't really change from two optionals). – rkouye Apr 13 '19 at 22:39

3 Answers3

18

I agree that mandating vector is inappropriate, and applaud your attempts to make the interface more useful.

If decode expects a contiguous sequence of uint8_t, the tried-and-tested (and most flexible) solution is just to take a const uint8_t* and a std::size_t (or alternatively two pointers, but pointer and length is more idiomatic).

From C++20 you can do this with one argument of type std::span<const uint8_t>. Or going back to pointers, if you really want to use modern library tools for the sake of it, you can confuse people with std::experimental::observer_ptr.

You may also consider making decode a template that accepts any iterator pair, and (if contiguity is needed) mandates, even if only by documentation, that the iterators reflect a contiguous sequence. But making everything a template isn't always what you want, and it isn't always useful.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 2
    There's many, many algorithms in the standard library that expect a contiguous sequence of some type (to name just one: std::sort). But as far as I can see, none of them takes a T* and length, they all use iterators. Seems very error prone. – Voo Apr 14 '19 at 09:00
  • 3
    @Voo: A `T*` _is_ an iterator... and the second iterator you pass is `raw_pointer + length` - another pointer. – einpoklum Apr 14 '19 at 09:01
  • 1
    @einpoklum Every T* is an iterator but not every iterator is a T*, which is pretty much my point. – Voo Apr 14 '19 at 09:01
  • 6
    @Voo, why do you think, that `std::sort` expects contiguous memory? It requires just random access iterators. – magras Apr 14 '19 at 09:03
  • 1
    @magras Fair enough it's not mandated, but that doesn't really answer the question: Why use a more error prone and limited interface instead of the standard approach here? – Voo Apr 14 '19 at 09:04
  • 3
    @Voo, if library is not header only (I doubt that codec library is) or closed source, I see no good way to use templates in public API. – magras Apr 14 '19 at 09:11
  • @magras That can certainly make it more cumbersome, not impossible though - depends on the exact code. Ranges certainly make for cleaner code if that's an option. – Voo Apr 14 '19 at 11:01
  • @Voo, technically you are right, it's possible to take a templated range, ask `data()` and `size()`, and forward call to an actual implementation. But you will need to ship some parts of C++20 to make it work ergonomically. – magras Apr 14 '19 at 11:39
  • 1
    `observer_ptr` is specifically to point to *single* objects, right? i.e. not arrays. hence no pointer arithmetic for instance. – Arvid Apr 14 '19 at 15:46
  • @Voo: That's where the final paragraph comes in. As for "more error-prone", I really don't see how. – Lightness Races in Orbit Apr 15 '19 at 10:04
  • @Arvid As the linked reference page says, it's supposed to be a "pretty" drop-in for raw pointers. You can use a `char*` to subscript an array, and I can't see why the same wouldn't be true for `observer_ptr`. – Lightness Races in Orbit Apr 15 '19 at 10:05
  • @lightness For the same reasons that a `char *, size_t` argument is more error prone than a `std::string` parameter. You have two linked parameters that are stored separately and you can unintentionally pass a wrong combination along. I somehow missed the last paragraph though, mea culpa. – Voo Apr 15 '19 at 10:56
  • @Voo But if you're passing iterators you still need two arguments. Arguably the `span` "solves" this but not really because you still have to construct it properly, so it's just moving the problem (such as it is) elsewhere. Ultimately, you always have to accurately describe the range and that always involves somehow describing its start, and its end. There's no way around that, and I don't see that `char*, size_t` makes it worse. And, really, if you can't pass a `char*, size_t` successfully then you have no business being in our business :P (where "you" = some arbitrary person) – Lightness Races in Orbit Apr 15 '19 at 11:08
  • @Lightness True, but the standard way to use iterators is to do the separation at the call point (vec.begin(), vec.end()) while pointer and length are usually stored and passed separately along. You don't have to, but providing such an API seems to me to encourage that style. But yes that's one of the downsides of the approach of the STL. – Voo Apr 15 '19 at 11:13
  • "And, really, if you can't pass a char*, size_t successfully then you have no business being in our business" Shall I go and fetch the number of exploits due to out of bounds accesses and wrong range computations? People make mistakes, good API design's goal should be to point them towards the safe solutions. – Voo Apr 15 '19 at 11:14
  • @Voo: Did you deliberately leave the ":P" out of the quote? – Lightness Races in Orbit Apr 15 '19 at 11:19
15

In addition to @Justin's valid suggestion of spans:

  • You might also want to consider using std::byte instead of uint8_t, so:
    Result decode(std::span<const std::byte> buffer);
    
    or if you're in C++17, use the span implementation from the C++ Guidelines Support library:
    #include <gsl/span>
    // etc.
    Result decode(gsl::span<const std::byte> buffer);
    
  • If you want to support decoding from containers other than raw memory, use arbitrary iterators (in C++17 and earlier) or possibly ranges (in C++20). The iterator version:

    template <typename InputIt>
    Result decode(InputIt start, InputIt end) { /* etc. */ }
    
  • It's fishy that a Decoder inherits from a Codec rather than the other way around.

  • The question of whether callbacks are a good choice or not is something that's difficult (for me) to answer without seeing the code. But do indeed use an std::variant to express the fact you have either a Packet or Metadata; you could also "combine" alternatives if instead of callbacks you use variants' std::visit.
einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • Decoder and another class, Encoder, inherit from Codec because Codec provides the majority of maintained state and a bunch of shared logic. Unfortunately, I'm not allowed to share the code; just the interface. Your suggestion of using `std::visit` looks promising. Thanks! – splicer Apr 13 '19 at 23:06
  • @splicer: In that case, consider either renaming Encoder or perhaps taking some functionality out into a namespace. – einpoklum Apr 13 '19 at 23:07
  • 2
    +1 for recommending `span`. I would even suggest implementing your own (simple) span if you're not allowed to use a library or C++20 – Arvid Apr 14 '19 at 15:49
  • 2
    @Arvid: Justin beat me to it, you should +1 his answer. Also - GSL is header-only, it's really not a big deal to use it, and if it's the fact that it's multiple files - there's always [GSL-lite](https://github.com/martinmoene/gsl-lite). – einpoklum Apr 14 '19 at 16:15
5

C++20 will have std::span, which does what you want:

    Result decode(std::span<uint8_t const> buffer);

std::span<T> is semantically equivalent to a T* buffer, size_t size.


In C++17, there are some implementations of a span type which are equivalent to std::span, such as the GSL's gsl::span. See What is a "span" and when should I use one? .

If you can't use any external libraries, consider writing your own span type, else uint8_t const* buffer_begin, uint8_t const* buffer_end can work.

Justin
  • 24,288
  • 12
  • 92
  • 142
  • 1
    Rather than writing my own `span`, do you think `basic_string_view` would make a good interim solution? – splicer Apr 14 '19 at 01:40
  • 1
    @splicer I almost suggested `basic_string_view`, but it's very situational. If you can guarantee that the input buffer would be null-terminated, it can work. If you can't, it's dangerous. – Justin Apr 14 '19 at 02:06
  • 3
    When `basic_string_view` is instantiated with both a pointer and a size, it allows for null characters to be stored mid-string. Of course, it's still too error-prone to use in the API. *However*, it does turn out to be useful on the implementation side if I go with the pointer & length API style. – splicer Apr 14 '19 at 04:00