1

I need to use a shared memory between processes and I found a sample code here. First of all, I need to learn how to create a shared memory block and store a string in it. To do that I used following code:

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>
#include <unistd.h>

void* create_shared_memory(size_t size) {
  // Our memory buffer will be readable and writable:
  int protection = PROT_READ | PROT_WRITE;

  // The buffer will be shared (meaning other processes can access it), but
  // anonymous (meaning third-party processes cannot obtain an address for it),
  // so only this process and its children will be able to use it:
  int visibility = MAP_ANONYMOUS | MAP_SHARED;

  // The remaining parameters to `mmap()` are not important for this use case,
  // but the manpage for `mmap` explains their purpose.
  return mmap(NULL, size, protection, visibility, 0, 0);
}



int main() {
  char msg[] = "hello world!";

  void* shmem = create_shared_memory(1);
  printf("sizeof shmem: %lu\n", sizeof(shmem));
  printf("sizeof msg: %lu\n", sizeof(msg));
  memcpy(shmem, msg, sizeof(msg));
  printf("message: %s\n", shmem);

}

Output:

sizeof shmem: 8
sizeof msg: 13
message: hello world!

In main function, I'm creating 1 byte shared memory block (shmem) and trying to store 13 byte information (char msg[]) in it. When I print out the shmem, it prints whole message. I'm expecting that, it prints out just 1 byte message in this case is just "h". Or it could give an error about memory size when compiled.

The question is that I'm missing sth here? Or is there a implementation issue? Does memcpy overlap here? I'm appreciated for any brief explanation.

Thanks in advance.

Ersel Er
  • 731
  • 6
  • 22
  • The underlying OS mechanism does not operate on one byte-sized chunks. In fact, almost nothing in your computer does. – pvg Nov 12 '17 at 14:15
  • @pvg So if I allocate 128 byte and try to store 129 byte data(129 char) in it, how does it works? Does it raise error? – Ersel Er Nov 12 '17 at 14:18
  • @ErselEr this is described in the documentation for `mmap` which you should review. "If offset or len is not a multiple of the pagesize, the mapped region may extend past the specified range. Any extension beyond the end of the mapped object will be zero-filled." – pvg Nov 12 '17 at 14:24
  • I think you must fill all location of shared memory with `\0` – EsmaeelE Nov 12 '17 at 14:53

2 Answers2

6
  1. In printf("message: %s\n", shmem);, the %s specifier says to print the “string” starting at shmem. For this purpose, a string is a sequence of characters ending with the null character. So the printf prints all the bytes it finds at shmem up to the null character. To limit it to at most one character, you can use %.1s instead, or you can explicitly print a character with printf("message: %c\n", * (char *) shmem);.

  2. When you allocate memory with mmap, the system works with memory in units of pages. The size of a page varies from system to system, but it is typically something like 512 or 4096 bytes, not 1. The standard specification for mmap only guarantees that the number of bytes you request is provided. There may be additional bytes accessible beyond this, but you should not rely on them being available. (Even if they appear to be available momentarily, the system might not save them to disk when your program is temporarily swapped out of memory, so they will not be restored when your program is brought back into memory to continue running.)

  3. sizeof(shmem) provides the size of shmem, which is a pointer. So it provides the size of the pointer, which is usually four or eight bytes on modern systems. It does not provide the size of the thing that shmem points to.

  4. In contrast, in sizeof(msg), msg is an array, not a pointer, so sizeof(msg) does provide the size of the array, as you likely intend.

  5. memcpy(shmem, msg, sizeof(msg)); copies 13 bytes (the size of your msg) into shmem. Those thirteen bytes are “hello world!” and a null character (value 0) at the end. memcpy does not have any way of knowing how long the source or destination is except for the length parameter that you pass. So it copies sizeof(msg) bytes. It does not limit itself to the size of the memory pointed to by shmem. It is your job to pass the correct length.

To answer your question about what happens if you use more bytes than mmap provides, the behavior is undefined. If you go beyond a page boundary, it is most likely that your program will crash because memory beyond that address is not mapped. But you might write bytes to a place in your memory you did not want to, and that can cause any variety of things to happen, because it can damage code or data that your program needs to execute properly.

In this case, you did not write beyond mapped memory. You asked for 13 bytes and were likely given 4096 (or whatever one page on your system is). Then you copied those 13 bytes into the buffer and printed them. So everything “worked.”

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • In the case of a file-backed mapping, writing to the excess capacity of the last page is defined to do nothing (and reads will give zero). In the case of an anonymous mapping (as in the question), what standard says you're allowed to use the excess capacity of the last page? – John Zwinck Nov 12 '17 at 14:33
  • @JohnZwinck: My mistake, you are correct for `mmap` in general. I will update. – Eric Postpischil Nov 12 '17 at 14:48
  • I wouldn't use the term 'undefined behaviour' in this context since it's easy to confuse with the way it's most commonly used for behaviour undefined by the C standard, of which this isn't an example. Posix docs seem to like 'implementation-dependent' which isn't a big leap in clarity (or perhaps even accuracy) but it is at least distinct. – pvg Nov 12 '17 at 16:40
  • @pvg: I do not comprehend why people hold the C standard above all else in this regard. In order to know what a program will do, you need to know things about the hardware it runs on, things the operating system it runs under, things about the compiler it is compiled with, things about the libraries it is linked with, and things about the linker and other tools it is built with. All of these things are prerequisites. If a logical deduction can be made from the prerequisites to an end result, the behavior is defined. Otherwise, it is not defined. The C standard is not a special part of this. – Eric Postpischil Nov 12 '17 at 18:14
  • It doesn't have much to do with people 'holding the C standard above all else'. Terminology develops independently of your or my preferences and by this point UB has a specific meaning related to its usage in the standard. If use use the term in a similar context but to mean something else, you introduce a trivially avoidable ambiguity and point of potential confusion. That's all. – pvg Nov 12 '17 at 18:49
  • @pvg: If you will notice, my answer does not use the term “undefined behavior.” It explicitly spells out “the behavior is undefined.” That is as plain as can be, not jargon. – Eric Postpischil Nov 12 '17 at 22:43
  • Yeah I noticed, it's a distinction without a difference. Plus, the behaviour is not really undefined. Anyway, my point is to get into a talmudic debate about this but to suggest a way you can improve your answer. If you don't want to, you don't want to. – pvg Nov 12 '17 at 22:46
  • @pvg: What do you mean the behavior is not undefined? What specification defines the behavior when you use more bytes than `mmap` provides? – Eric Postpischil Nov 12 '17 at 22:47
0

Your code violates the contract of mmap() by writing more than 1 byte into a memory mapping requested with size 1 byte.

However, as you have discovered, it may sometimes work on some systems. This might be because the size of one page (in the memory mapping) is e.g. 4 KB. So perhaps the mapping is larger than requested. Still, you have no right to use it as you have done.

So, stop doing that.


You asked if it should be a compilation error. The answer is no: the compiler does not have special cases for every library routine like mmap(). It does not know that the size parameter to mmap() means that the returned pointer is only valid for that many bytes. A static analyzer could possibly figure this out, but it would not be typical for a compiler to do so.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • The fundamental cause of why the OP is seeing the results they are seeing is they misunderstood of how lengths of data are managed. Notably, `printf` with `%s` uses a null character to stop, not other knowledge of string length, and `memcpy` uses the length given it, not any length associated with its other arguments. “Undefined behavior” is not the cause of the results the OP is seeing. (“Undefined behavior” can never be a **cause** of anything—since it is the lack of a definition, it can only explain why code was not guaranteed to behave in a certain way, not why it did behave in some way.) – Eric Postpischil Nov 12 '17 at 14:55
  • 1
    Note that you started by telling the OP they “invoked” undefined behavior, but you did not explain that their `memcpy` wrote 13 bytes, not 1, or that their `printf` used the values of the bytes to find the end, not other knowledge of the buffer size. It is apparent the OP thought that `memcpy` and/or `printf` would limit the length used to 1. Given that misbelief, they could not have diagnosed the fact that behavior was not defined. As long as they believed they were copying and printing one byte, they believed they were not exceeding the mapped buffer. – Eric Postpischil Nov 12 '17 at 14:58