2

Or even better a template <T*>?

In case the memory mapped file contains a sequence of 32 bit integers, if data() returned a void*, we could be able to static cast to std::uint32_t directly.

Why did boost authors choose to return a char* instead?

EDIT: as pointed out, in case portability is an issue, a translation is needed. But saying that a file (or a chunk of memory in this case) is a stream of bytes more than it is a stream of bits, or of IEEE754 doubles, or of complex data structures, seems to me a very broad statement that needs some more explanation.

Even having to handle endianness, being able to directly map to a vector of be_uint32_t as suggested (and as implemented here) would make the code much more readable:

struct be_uint32_t {
  std::uint32_t raw;
  operator std::uint32_t() { return ntohl(raw); }
};

static_assert(sizeof(be_uint32_t)==4, "POD failed");

Is it allowed/advised to cast to a be_uint32_t*? Why, or why not?

Which kind of cast should be used?

EDIT2: Since it seems difficult to get to the point instead of discussing weather the memory model of an elaborator is made of bits, bytes or words I will rephrase giving an example:

#include <cstdint>
#include <memory>
#include <vector>
#include <iostream>
#include <boost/iostreams/device/mapped_file.hpp>

struct entry {
  std::uint32_t a;
  std::uint64_t b;
} __attribute__((packed)); /* compiler specific, but supported 
                              in other ways by all major compilers */

static_assert(sizeof(entry) == 12, "entry: Struct size mismatch");
static_assert(offsetof(entry, a) == 0, "entry: Invalid offset for a");
static_assert(offsetof(entry, b) == 4, "entry: Invalid offset for b");

int main(void) {
  boost::iostreams::mapped_file_source mmap("map");
  assert(mmap.is_open());
  const entry* data_begin = reinterpret_cast<const entry*>(mmap.data());
  const entry* data_end = data_begin + mmap.size()/sizeof(entry);
  for(const entry* ii=data_begin; ii!=data_end; ++ii)
    std::cout << std::hex << ii->a << " " << ii->b << std::endl;
  return 0;
}

Given that the map file contains the bit expected in the correct order, is there any other reason to avoid using the reinterpret_cast to use my virtual memory without copying it first?

If there is not, why force the user to do a reinterpret_cast by returning a typed pointer?

Please answer all the questions for bonus points :)

baol
  • 4,362
  • 34
  • 44
  • 1
    You could use `reinterpret_cast`... – Brian Bi May 14 '15 at 18:25
  • `void` is nothing. It's useless to dereference it. Mmaps aren't designed to be useless – sehe May 14 '15 at 20:34
  • @sehe: void would just mean: "I don't know what I'm pointing to, please make sure that you do before accessing the data!". It would make more sense to me than being certain that it is pointing to bytes even if it is indeed pointing to little endian 32 bit integers! – baol May 16 '15 at 16:24
  • The point is that the mapped file does know. It points to bytes – sehe May 16 '15 at 16:25
  • @sehe: No, it points to what I wrote into the file. – baol May 16 '15 at 16:28
  • @baol it points to bytes and only bytes. You should read Lightness' one more time – sehe May 16 '15 at 16:48
  • @sehe: there are useful use cases in which the translation can be safely omitted. – baol May 16 '15 at 17:06
  • @sehe: and even in cases in which the translation is needed it may be more useful to see the file as a stream of little endian 32 bits integers than just bytes! It's a matter of using the correct abstraction. – baol May 16 '15 at 17:08
  • @baol So, in the end you don't really ask a question here. You just wanted to rant about how you think it's ridiculous that maps map regions of `char` data in memory. That that's not "_using the correct abstraction_" (ugh). Maybe start a blog. (If you don't want to hear other people's arguments, be sure to lock the comments.) – sehe May 16 '15 at 18:44
  • If you are still interested look through my answers for [ideas on how to use mapped/shared memory **with** the proper abstractions](http://stackoverflow.com/search?tab=votes&q=managed_mapped_file%20managed_shared_memory). – sehe May 16 '15 at 18:44
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/77980/discussion-between-baol-and-sehe). – baol May 16 '15 at 18:48

3 Answers3

2

In case the memory mapped file contains a sequence of 32 bit integers, if data() returned a void*, we could be able to static cast to std::uint32_t directly.

No, not really. You still have to consider (if nothing else) endianness. This "one step conversion" idea would lead you into a false sense of security. You're forgetting about an entire layer of translation between the bytes in the file and the 32-bit integer you want to get into your program. Even when that translation happens to be a no-op on your present system and for a given file, it's still a translation step.

It's much better to get an array of bytes (literally what a char* points to!) then you know you have to do some thinking to ensure that your pointer conversion is valid and that you are performing whatever other work is required.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 1
    For endianness I recommend making a set of classes like `le_uint32_t` which have `uint32_t` conversions and a `operator=` but avoid constructors, or whatever it takes to maintain PODness in C++11 now. – Zan Lynx May 14 '15 at 23:18
  • Not sure that a char is guaranteed to be a byte either. It seems to me that char* may be more misleading than void* from the translation point of view. (btw: good that you pointed out the false sense of security, be aware that it is common to all forms of untested software :) – baol May 15 '15 at 19:17
  • In the general case are you suggesting to copy the bytes to the destination data structure that will hold them one by one in the correct place, instead of mapping directly to a packed structure? Instead of solving the problem of endianness as Zan suggested? – baol May 15 '15 at 19:33
  • @baol: _"Not sure that a char is guaranteed to be a byte either."_ Your lack of certainty notwithstanding, that is indeed a guarantee. – Lightness Races in Orbit May 16 '15 at 03:56
  • @baol: That is about whether a byte is guaranteed to be an octet (it isn't). Not whether a `char` is guaranteed to be a byte (it is). :) – Lightness Races in Orbit May 16 '15 at 13:50
  • Agreed, my confusion! :) But talking about the sex of the angels we missed the point! The point was about the absolute portability of the char* between architecture. If a byte (or a char) is not guaranteed to be 8bits, the portability is not guaranteed returning a char*! – baol May 16 '15 at 15:21
  • The "C++-byte"; Another product of the standards questionable taste for terminology. – Columbo May 16 '15 at 16:55
  • @Columbo: I've never heard that phrase before _you_ used it. – Lightness Races in Orbit May 17 '15 at 04:05
1

char* represents array of raw bytes, which is what mapped_file::data is in most general case.

void* would be misleading as it provides less information about the contained type and requires more setup to work with then char* - we know that file contents are some bytes, which char* represents.

Template return type would require conversion to that type be performed inside the library, while it makes more sense to do that on the caller side (since library just provides an interface to raw file contents, and the caller knows specifically what those contents are).

Ilya Kobelevskiy
  • 5,245
  • 4
  • 24
  • 41
  • But what if my mapped file contains something else than bytes? Do you see the need for copying the bytes over to my data structure instead of simply casting to what I need in place? (assuming I have control on the file and do not need to port the file on a different architecture) – baol May 15 '15 at 19:36
  • "what if my mapped file contains something else than bytes" I don't know of anything in a computer that is not made up of bytes. What do you mean? – edmz May 16 '15 at 12:55
  • @black I mean that a byte is an assumption on the contained data as valid as any other. I may know for sure that my file is an array of nibbles! Does this clarify? – baol May 16 '15 at 13:38
  • 1
    @baol Yes. But that's what `char*` is for: you give it what it means. Now, is it as reliable as a template instantiation? No, it isn't by far. You may fail to give it a right type and make the program crash. However, with templates, each time you call the function a new instantiation could be needed. – edmz May 16 '15 at 15:43
  • @black, I think from point of view of boost library developers char* is by far *more* reliable then template instantiation - library only considers common general case - which is raw bytes. As a library user, you are free to develop and support your own adapter which can convert those bytes further to whatever you like using template instantiations. – Ilya Kobelevskiy May 20 '15 at 18:59
0

Returning a char * seems to be just a (peculiar) design decision of boost::iostreams implementation.

Other APIs like e.g. the boost interprocess return void*.

As observed by sehe the UNIX mmap specification (and malloc) use void* as well.

It is somewhat a duplicate of void* or char* for generic buffer representation?

As a note of caution the layer of translation mentioned by Lightness in another answer may be needed when the memory is written from one architecture and read on a different one. Endianness is easy to solve using a conversion type, but alignment need to be considered as well.

About static cast: http://en.cppreference.com/w/cpp/language/static_cast mentions:

A prvalue of type pointer to void (possibly cv-qualified) can be converted to pointer to any type. If the value of the original pointer satisfies the alignment requirement of the target type, then the resulting pointer value is unchanged, otherwise it is unspecified. Conversion of any pointer to pointer to void and back to pointer to the original (or more cv-qualified) type preserves its original value.

So if the file to be memory mapped was created on a different architecture with a different alignment, the loading may fail (e.g. with a SIGBUS) depending on the architecture and the OS.

Community
  • 1
  • 1
baol
  • 4,362
  • 34
  • 44