12

I have a large buffer:

char *buf = malloc(1000000000); // 1GB

If I forked a new process, it would have a buf which shared memory with the parent's buf until one or the other wrote to it. Even then, only one new 4KiB block would need to be allocated by the kernel, the rest would continue to be shared.

I'd like to make a copy of buf, but I'm only going to change a little of the copy. I'd like copy-on-write behaviour without forking. (Like you get for free when forking.)

Is this possible?

fadedbee
  • 42,671
  • 44
  • 178
  • 308
  • sure, but it won't be 'for free' - you'll have to your own memory management and keep track of changes. – Marc B Jun 12 '12 at 14:41
  • 1
    Yes, I want 'for free'. I was wondering whether there were any mmap based solutions, or maybe something I hadn't even imagined. – fadedbee Jun 12 '12 at 14:45
  • Perhaps mmap with MAP_ANONYMOUS and MAP_PRIVATE would do the job? – fadedbee Jun 12 '12 at 14:48
  • possible duplicate of [Can I do a copy-on-write memcpy in Linux?](http://stackoverflow.com/questions/1565177/can-i-do-a-copy-on-write-memcpy-in-linux) – ugoren Jun 12 '12 at 15:05
  • `1000000000` Byte is not 1 GB. It should be `1073741824` (1024 * 1024 * 1024). – qwertz Jun 12 '12 at 15:24
  • @wabepper it depends on whose definition you go by. According to microsoft windows, yes, that's the case, but, according to more recent versions of Mac OSX, it IS 1 GB. There isn't that big of a difference, and in most situations the difference is marginal, at the point which you are storing that amount of information. – Richard J. Ross III Jun 12 '12 at 15:35
  • Also, I just wanted to say MOO! – Richard J. Ross III Jun 12 '12 at 15:35
  • @wabepper (feeding the troll) 10^9 bytes == 1 GB, 2^30 bytes == 1 GiB. Giga is an SI prefix. – fadedbee Jun 16 '14 at 09:44
  • See also: https://stackoverflow.com/questions/16965505/allocating-copy-on-write-memory-within-a-process tl;dr: it's difficult on Linux. – Rusty Shackleford Jul 06 '15 at 16:48

1 Answers1

12

You'll want to create a file on disk or a POSIX shared memory segment (shm_open) for the block. The first time, map it with MAP_SHARED. When you're ready to make a copy and switch to COW, call mmap again with MAP_FIXED and MAP_PRIVATE to map over top of your original map, and with MAP_PRIVATE to make the second copy. This should get you the effects you want.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    That looks very encouraging, but I can't get it to work. I get a bus error (on line 13). fd == 3. Could you point out my stupid mistake? https://gist.github.com/2924412 – fadedbee Jun 13 '12 at 14:29
  • 1
    You need `ftruncate` to give the shared memory segment a size. The initial size is zero. – R.. GitHub STOP HELPING ICE Jun 13 '12 at 15:16
  • Thanks, I added an ftruncate and now have a segfault instead of a bus error, still at line 14. – fadedbee Jun 13 '12 at 15:26
  • 1
    I suspect the crash is actually at line 17 where you write to the buffer. Your debug printf's are useless because you're not flushing output and they don't even end in `\n`.. – R.. GitHub STOP HELPING ICE Jun 13 '12 at 15:30
  • Or it could even be way down. Your use of `MAP_FIXED` is wrong. You The second call to `mmap` should have `MAP_FIXED`, if you intend on making further modifications through `buf`, but you need to pass `buf` instead of `NULL` as the first argument in that case. As it stands, both of your `mmap` calls with `MAP_FIXED` are trying to map at the 0 address, which probably fails (you're not checking any return values!) and then you end up trying to use `MAP_FAILED` (usually defined as `(void *)-1`) as if it were a valid pointer. – R.. GitHub STOP HELPING ICE Jun 13 '12 at 15:33
  • The final `mmap` call should NOT have `MAP_FIXED` since you want a new virtual address for it. – R.. GitHub STOP HELPING ICE Jun 13 '12 at 15:33
  • Thanks for all your help. This will take me a few minutes to get right. – fadedbee Jun 13 '12 at 15:42
  • 3
    It works! https://gist.github.com/2924412 Whay was the point of the commented out remapping of buf? I don't seem to need it. Many thanks. – fadedbee Jun 13 '12 at 15:59
  • If you don't to the re-mapping of `buf` and modify `buf` *after* making the new private mapping but *before* modifying the corresponding page in `buf2`, it's possible that your changes to `buf` will wrongly show up in `buf2`. If you never intend to modify `buf` after making `buf2`, you can skip re-mapping `buf` private. But if you want to be able to modify them both without making your program randomly misbehave, you need to private re-mapping of `buf`. – R.. GitHub STOP HELPING ICE Jun 13 '12 at 16:10
  • Thanks, that explains it. I just need a series of 'frozen' buffers and one which I can mutate. – fadedbee Jun 14 '12 at 10:46
  • AFAIU, it does not work. It might work on some OSes but on Linux, when you modify the original MAP_SHARED mapping, the MAP_PRIVATE will still see the pages backing the real file. The MAP_PRIVATE pages will only "fork" from the file when the data is modified via the MAP_PRIVATE VMA. – ysdx Jun 10 '15 at 13:19
  • Not if you write to the private map first. – R.. GitHub STOP HELPING ICE Jun 10 '15 at 20:33