19

I'm designing a Buffer class whose purpose is to represent a chunk of memory.

My underlying buffer is a char* (well, a boost::shared_array<char> actually, but it doesn't really matter).

I'm stuck at deciding what prototype to choose for my constructor:

Should I go with:

Buffer(const void* buf, size_t buflen);

Or with:

Buffer(const char* buf, size_t buflen);

Or something else ?

What is usually done, and why ?

ereOn
  • 53,676
  • 39
  • 161
  • 238
  • @Luca Matteis: I intend to provide implicit sharing of memory. But that's quite off-topic. – ereOn Sep 28 '10 at 13:27
  • 1
    There is an unfortunate transmogrification - prompted by Unicode - in that `char` is becoming a synonym for `octet` and has little to do with "character". This question is symptomatic of that change; I've been tempted to `typedef char octet;` at times to make the code less misleading. – msw Sep 28 '10 at 13:29
  • @msw: Other than type compatibility with string literals, is there any reason to use "char", as distinct from "signed char" or "unsigned char"? It seems a rather useless type to me. – supercat Sep 28 '10 at 15:09
  • @supercat: I agree with your point; if I used C more these days, I'd probably care enough to assert that "signed char" be stricken from the language at least. Put another way, I really meant `typedef unsigned char octet` if I'd given it more thought. – msw Sep 29 '10 at 03:09
  • @msw: Why would you eliminate one of the useful and essential types? Type 'char' is the more useless one, since its useful range is only 0..127. – supercat Sep 29 '10 at 13:31
  • See also [With std::byte standardized, when do we use a void* and when a byte*?](https://stackoverflow.com/posts/comments/103681769?noredirect=1) – bobobobo Nov 06 '19 at 22:00

6 Answers6

17

For the constructor and other API functions, the advantage of void* is that it allows the caller to pass in a pointer to any type without having to do an unnecessary cast. If it makes sense for the caller to be able to pass in any type, then void* is preferable. If it really only makes sense for the caller to be able to pass in char*, then use that type.

Mike Morearty
  • 9,953
  • 5
  • 31
  • 35
16

API interface is more clear for user, if buffer has void* type, and string has char* type. Compare memcpy and strcpy function definitions.

Alex F
  • 42,307
  • 41
  • 144
  • 212
  • 1
    Calling something `char*` _implies it is signed data_ or character data. If the data _is not_ characters, or to be interpretted as signed bytes, then you should not use `char*` to point to it. – bobobobo Nov 04 '19 at 11:16
9

C++17

C++17 introduced std::byte specifically for this.

Its definition is actually simple: enum class byte : unsigned char {};.


I generally used unsigned char as the underlying structure (don't want signedness to mess up with my buffer for I know what reason). However I usually typedefed it:

// C++11
using byte = unsigned char;

// C++98
typedef unsigned char byte;

And then refer to it as byte* which neatly conveys the meaning in my opinion, better than either char* or void* at least.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 2
    I can't +1 this hard enough. This relieves all those bugs with `char` such as `char buf[42]; ... if(buf[i] == 0xFF) // *never true*` – Thanatos Sep 28 '10 at 18:13
  • There is one thing you need to pay attention to with this - if the buffer contains any null bytes, it will be treated as a null-terminator for the C string and lessen the supposed size of the "buffer" returned by sizeof. – tomysshadow May 06 '18 at 08:16
  • 1
    `sizeof` is not the same as `strlen`, and does not take null terminators into account – Daniel Stevens Jun 04 '18 at 06:30
  • 1
    This is should be the accepted answer. That char and unsigned char thing is really annoying, and totally anti-intuitive. – John Z. Li Sep 05 '18 at 08:22
  • 1
    @JohnZ.Li: Thanks to your comment I noticed that my answer was out of date. C++17 introduced `enum class byte : unsigned char {};`, even better than a typedef! – Matthieu M. Sep 05 '18 at 08:31
  • -1. `std::byte` is _not_ an `unsigned char`. You're not [indicating the limitations of `std::byte`](https://stackoverflow.com/a/58691002/) -- you cannot even do `a+b` where `a` and `b` are both type `std::byte` – bobobobo Nov 04 '19 at 11:20
  • @bobobobo: The definition of `std::byte` can be found on [cppreference](https://en.cppreference.com/w/cpp/types/byte), it's an enum whose underlying type is an `unsigned char`. You are correct that it does not support arithmetic operations, but I fail to see how it matters when the goal is to replace `void*` which does not **either**. – Matthieu M. Nov 04 '19 at 12:49
  • @MatthieuM since the bitwise ops are defined for `std::byte` you should not use `std::byte` over `void*` _unless you need that functionality_ in your pointer – bobobobo Nov 05 '19 at 00:16
  • @bobobobo: That's a very subjective guideline. I prefer using `std::byte*` for byte buffers, and `void*` for C-style "handles", which is another subjective guideline. – Matthieu M. Nov 05 '19 at 07:47
  • @MatthieuM It's more a _minimal interface_ principle -- if you won't use bitwise ops on the buffer, why do you imply you might by how you point to it? It helps you more quickly understand how _the code_ sees this data – bobobobo Nov 05 '19 at 07:57
  • @bobobobo: I understand the principle, however here it conflicts with another principle of conveying intent. At some point it's a judgement call whichever is more valuable to you; in my experience, the latter has been more valuable. YMMV. – Matthieu M. Nov 05 '19 at 10:03
  • @MatthieuM. I would stick with `unsigned char*` to point to byte buffers unless you want to explicitly forbid basic math ops like `+`, `-` etc – bobobobo Nov 05 '19 at 10:11
7

I'd prefer char*, because for me personally it plays better with being "a buffer". void* seems more like "a pointer to I don't know what". Besides, it is what your underlying is, anyway.

Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
7

I'd recommend uint8_t, which is defined in stdint.h. It's basically the same thing as the "typedef unsigned char byte;" that others have been recommending, but it has the advantage of being part of the C standard.

As for void*, I would only use that for polymorphism. ie. I'd only call something a void pointer if I didn't yet know what type of thing it would be pointing to. In your case you've got an array of bytes, so I'd label it as such by using uint8_t* as the type.

karadoc
  • 2,641
  • 22
  • 21
3

I prefer unsigned char * or uint8_t * for buffer implementations, since void * has the annoying restriction that you can't perform pointer math on it. So if you want to process some data at some offset from the buffer, or just break your buffer up into chunks or whatever, you are stuck casting to some other type anyway to do the math.

I prefer unsigned char * or uint8_t * over plain char * because of the special rules regarding aliasing and char *, which has the potential to seriously pessimize some loops working on your buffers.

Community
  • 1
  • 1
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386