4

I need to open a file and load it in shared memory via mmap, but if the file does not exist yet, I want to open it, write some (fake) data to it, and then mmap it. I wrote the following function in C, but I'm getting an error in the write (see below). (I know the mmap part is probably wrong (data is assigned twice!), but the error happens before that, so it should not have any influence on this issue).

// These 2 are global so they can be referenced in other functions.
int dfd = -1;
long* data = NULL;

void load_data(char* filename)
{
  dfd = open(filename, O_RDONLY);

  if (dfd == -1) {

    printf("Creating file %s\n", filename);

    dfd = open(filename, O_CREAT | O_WRONLY, S_IRUSR | S_IWUSR);

    if (dfd == -1) {
      fprintf(stderr, "Couldn't create file %s\n", filename);
      perror("create");
      exit(1);
    }

    data = (long *) valloc(M * GB);

    if (data == nullptr) {
      fprintf(stderr, "Couldn't allocate %ld bytes", (M * GB));
      perror("malloc");
      exit(1);
    }

    for (size_t i = 0; i < M * GB / sizeof(long); ++i)
      data[i] = (long) i;

    printf("%d %p %ld\n", dfd, data, M * GB);

    ssize_t w = write(dfd, (void*) data, M * GB);

    if (w != M * GB) {
      fprintf(stderr, "Couldn't write %ld bytes to file %s\n", (M * GB), filename);
      fprintf(stderr, "Wrote %ld bytes\n", w);
      perror("write");
      exit(1);
    }
  }

  data = (long *) mmap(0, M * GB, PROT_READ, MAP_SHARED, dfd, 0);

  if (data == MAP_FAILED) {
    perror("mmap");
    exit(1);
  }
}

Output and error on MacOS 64 bits, Apple g++:

Creating file bench2_datafile.bin
3 0x101441000 2147483648
Couldn't write 2147483648 bytes to file bench2_datafile.bin
Wrote -1 bytes
write: Invalid argument

Any pointer? I keep reading the open and write doc, and looking for examples on the internet, but I can't seem to get over this error.

After benefiting from comments:

Output on RHEL 6, g++ 4.8:

Creating file bench2_datafile.bin
3 0x7f79048af000 2147483648
write: Success
Couldn't write 2147483648 bytes to file bench2_datafile.bin
Wrote 2147479552 bytes

and 2147479552 is indeed the file size in ls.

Also, it works on Mac with 1 GB - but it runs out of steam with 2 GB. Oh well - my real target is Linux anyway, it was just more convenient to work on the Mac till I got the bugs out :-)

underscore_d
  • 6,309
  • 3
  • 38
  • 64
Frank
  • 4,341
  • 8
  • 41
  • 57
  • 1
    why the use of `open` instead of `fopen` ? you could test if the file exists with `fopen(filepath, "r")` and if it doesn't exist use `fopen(filepath, "w")` to write to it, afterwards continue the way you do when the file exists – Meik Vtune Jul 26 '16 at 09:41
  • 2
    Don't call other functions between a failed syscall and perror, you might reset errno and get meaningless error printouts. Make sure you have large file support enabled. – Mat Jul 26 '16 at 09:41
  • 3
    @MeikVtune *why the use of open instead of fopen ?* Because `mmap()` requires an `int` type file descriptor like that returned by `open()`. Also, `fopen()`/`fwrite()` buffers write operations - which isn't necessary in this case. – Andrew Henle Jul 26 '16 at 09:43
  • I tried fopen but I read somewhere else than fopen and mmap shouldn't be mixed - fopen return a FILE*, you have to call fileno to get an int to pass to mmap, and apparently it's not good to mix low level open/mmap with C library fopen. – Frank Jul 26 '16 at 09:43
  • @MeikVtune: testing then creating is a classic race condition. – Karoly Horvath Jul 26 '16 at 09:44
  • 1
    What operating system? Is this a 32- or 64-bit executable? – Andrew Henle Jul 26 '16 at 09:44
  • 1
    Also, if you use mmap, you are probably not working with streams, but rather with random accesses, for which you don't want the buffering behavior of fopen. – Frank Jul 26 '16 at 09:44
  • 1
    This is on MacOS, 64 bits, compiled with g++ Apple LLVM version 7.0.0 (clang-700.1.76) – Frank Jul 26 '16 at 09:45
  • 1
    You haven't run into this bug yet, but when you do write the data, `mmap()` is going to fail. You open the file when you write it with the `O_WRONLY` flag, but then (after a successful `write()`) your code calls `mmap()` with the `PROT_READ` flag. That combination of a write-only file descriptor and read access to the `mmap()`'d file won't work. – Andrew Henle Jul 26 '16 at 09:51
  • As for why `write()` is failing, it's not documented in [the man page](https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages_iPhoneOS/man2/pwrite.2.html), but I suspect `write()` operations on Mac OS as large as you're trying don't work. – Andrew Henle Jul 26 '16 at 09:52
  • Actually, you can use fopen. You can get the file descriptor from a FILE* by using fileno. See https://stackoverflow.com/questions/3167298/how-can-i-convert-a-file-pointer-file-fp-to-a-file-descriptor-int-fd. – mwk Jul 26 '16 at 09:57
  • @AndrewHenle I think that's it too. Read my answer. The real issue with this code is that you can't check `w != M * GB` because that's not guaranteed, that IS IN THE DOCUMENTATION. Instead, check `w != -1` and `errno` if true, if false call `write` again to write the remaining bytes, repeat until you have written all the bytes. – Iharob Al Asimi Jul 26 '16 at 09:57
  • @mwk Using `fopen()` will not help. – Iharob Al Asimi Jul 26 '16 at 09:58
  • On RHEL 6, it works, but I don't get quite what I want - I'll add details in the main post for readability. – Frank Jul 26 '16 at 10:00
  • @Frank, on Linux it works. It's also documented in the manual page. – Iharob Al Asimi Jul 26 '16 at 10:01
  • @iharob From the man page: "[EINVAL] The pointer associated with fildes is negative." That really doesn't make much sense (how can a *pointer* be negative?) But if there's a 32/64-bit bug in the implementation along with the obvious problems with the man page, 2147483648 will be a negative 32-bit signed value. – Andrew Henle Jul 26 '16 at 10:02
  • @AndrewHenle Which manual pages are you reading? I have my local man-pages on my Fedora 23 and it doesn't say anything like that. – Iharob Al Asimi Jul 26 '16 at 10:04
  • @iharob The Mac OS X one here: https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages_iPhoneOS/man2/pwrite.2.html – Andrew Henle Jul 26 '16 at 10:04
  • 4
    @Frank From [the Linux man page](http://man7.org/linux/man-pages/man2/write.2.html): *On Linux, write() (and similar system calls) will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.)* – Andrew Henle Jul 26 '16 at 10:06
  • @AndrewHenle I think you're right. But still my claim that checking `errno` is necessary holds. So @[Frank](http://stackoverflow.com/users/759880/frank) please, do check if `write()` returns `-1` and then check `errno`. – Iharob Al Asimi Jul 26 '16 at 10:06
  • @AndrewHenle So my answer is correct. – Iharob Al Asimi Jul 26 '16 at 10:07
  • @iharob - I'm not going to check errno - but rather the number of bytes actually written, so that I can continue writing the remaining ones in a loop. Or I can also check errno at each iteration of the loop, but you don't want to lose track of how many bytes were actually written in each iteration. – Frank Jul 26 '16 at 10:09
  • Thanks to all - I think I have a good idea of what is going on. To sum up: 1. on MacOSX, there is some kind of limitation that crashes write (but ultimately, I work on Linux) ; 2. on Linux, it works, but the number of bytes that can be written with each call of write is actually 0x7ffff000, so I'm going to have to iterate to write all the bytes I want ; 3. I need to fix the mmap as per comment above ; 4. I shall not call other functions between a failed sys call and perror :-) Thanks! – Frank Jul 26 '16 at 10:11
  • You MUST check `w` for `-1` because it's a value that `write()` returns on error. Then if it's not `-1` you can check how many bytes were actually written. If it was `-1` check `errno`. – Iharob Al Asimi Jul 26 '16 at 10:11
  • @iharob - yeah, yeah - I think everybody got it :-) – Frank Jul 26 '16 at 10:12
  • @Frank, it's always a good idea to write something like `while (remaining_bytes > 0) {written = write( ... ); check_error_and_handle(written); remaining_bytes -= written; ...};` – Iharob Al Asimi Jul 26 '16 at 10:13
  • @iharob Sure, it's correct, but it leaves the specific 2 GB limit encountered here as an exercise for the reader. Andrew's comment is more succinct in that respect. But you make very good general points. – underscore_d Jul 26 '16 at 10:21
  • @AndrewHenle I think you should write an answer as the currently accepted one looks more like guesswork than the perfectly specific `man` page reference you cited. – underscore_d Jul 26 '16 at 11:10
  • You don't need to loop to write more than 2GB. [How can I write/create a file larger than 2GB by using C/C++](http://stackoverflow.com/q/10042191/995714), http://stackoverflow.com/q/11169202/995714, http://stackoverflow.com/q/23037130/995714, http://stackoverflow.com/q/730709/995714 – phuclv Jul 26 '16 at 11:15
  • Does this answer your question? [cannot write(2) file larger than 2GB (up to 2TB)](https://stackoverflow.com/questions/44939835/cannot-write2-file-larger-than-2gb-up-to-2tb) – phuclv Jun 23 '23 at 02:18

1 Answers1

2

Many platforms uses 32-bit values for file positions. In addition, the interface requires the value to be signed. That means that you can get into trouble whenever you want to handle files larger than 2 GB.

Some platform provide non-standard functions for manipulating larger files.

You need to check the platform documentation to see what goes for the platform(s) you want to target.

Klas Lindbäck
  • 33,105
  • 5
  • 57
  • 82