2

I'm trying to truly understand the use of lseek() while creating a file of the needed size. So I wrote this code whose only goal is to create a file of the size given in the input.

Running for example:

$ ./lseek_test myFile 5

I would expect it to create a file named myFile of 5 bytes whose last byte is occupied by the number 5. What I get is a file I can't even access. What's wrong? Did I badly interpret lseek() usage?

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>

#define abort_on_error(cond, msg) do {\
    if(cond) {\
        int _e = errno;\
        fprintf(stderr, "%s (%d)\n", msg, _e);\
        exit(EXIT_FAILURE);\
    }\
} while(0)

/* Write an integer with error control on the file */
void write_int(int fd, int v) {
    ssize_t c = write(fd, &v, sizeof(v));
    if (c == sizeof(v))
        return;
    abort_on_error(c == -1 && errno != EINTR, "Error writing the output file");
    abort_on_error(1, "Write operation interrupted, aborting");
}

int main(int argc, char *argv[]) {
    // Usage control
    abort_on_error(argc != 3, "Usage: ./lseek_test <FileName> <FileSize>");

    // Parsing of the input
    int size = strtol(argv[2], NULL, 0);
    // Open file
    int fd = open(argv[1], O_RDWR|O_CREAT, 0644);
    abort_on_error(fd == -1, "Error opening or creating file");

    // Use lseek() and write() to create the file of the needed size
    abort_on_error(lseek(fd, size, SEEK_SET) == -1, "Error in lseek");
    write_int(fd, size); // To truly extend the file 

    //Close file
    abort_on_error(close(fd) == -1, "Error closing file");
    return EXIT_SUCCESS;
}
Robb1
  • 4,587
  • 6
  • 31
  • 60
  • You have too many customized functions in that code. Please make a [minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) if you want help debugging your code. – giusti Jan 05 '17 at 17:14
  • What do you mean by "a file I can't even access"? – Mat Jan 05 '17 at 17:14
  • 1
    It works for me on Linux / Debian. The file is actually size + (sizeof(int)) bytes, but I can dump it out (it's all zeros, plus the size at the end). I would probably avoid the abort_on_error() macro...it's not structured exactly correctly for doing what you want, and that sort of thing often ends up biting somebody (maybe not you) in the ass later. – Dave M. Jan 05 '17 at 17:15
  • 1
    @yano: seeking & writing is a valid way to extend a file. The code above works (doesn't do exactly what OP wants but close). – Mat Jan 05 '17 at 17:16
  • @yano, I think you're missing the point, which is that with `lseek()` you can seek not only to the end of the file, but any number of bytes *past* the end of the file, just as the OP is doing. His program is perfectly conforming (to POSIX) as far as I see; I think he just has the wrong expectation of the results. – John Bollinger Jan 05 '17 at 17:51
  • @JohnBollinger "I think you're missing the point".. Apparently! The thought of `lseek`ing beyond the end of a file has never occurred in my brain and sounds like dangerous territory, but there's clearly a lot I don't know. I'll stick to my more comfortable append methods. – yano Jan 05 '17 at 18:02
  • Cannot reproduce. A stable Debian produces `myFile` like this: `00 00 00 00 00 05 00 00 00`. This is a 4 byte integer at offset 5, as sought to. We leads to file size of 9. – alk Jan 05 '17 at 19:20
  • Thanks everybody for the help. @giusti I tried to reduce the length of the code. – Robb1 Jan 06 '17 at 10:38

2 Answers2

7

Your program works for me exactly as I would expect, based on its implementation:

  • supposing that the named file does not initially exist, it creates it
  • it writes the 4 bytes of an int (sizeof(int)) having value 5 into the file, starting at offset 5
  • it writes nothing at offsets 0 - 4; these are filled with null bytes.

The result is a nine-byte file, with byte values (not printable digits):

0 0 0 0 0 5 0 0 0

(My system is little-endian.) Note in particular that that file is not a text file in any sense. If you expected a text file, as seems to be the case, you might indeed see unexpected behavior with regard to it that you might characterize as not being able to access it.

Some considerations, then:

  • The fifth byte of a file is at offset 4 from the beginning, not 5.
  • If you want to write the digit '5' then store it in a char and write that char; do not write its int representation. Alternatively, wrap your file descriptor in a stream and use stream I/O functions, such as fputc().
  • If you want to fill the other space with anything other than null bytes then you'll need to do that manually.
  • As far as I can determine, this is all as required by POSIX. In particular, it says this of lseek:

The lseek() function shall allow the file offset to be set beyond the end of the existing data in the file. If data is later written at this point, subsequent reads of data in the gap shall return bytes with the value 0 until data is actually written into the gap.

(POSIX 1003.1-2008, 2016 Edition)

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Thank you very much for the clarifying answer! Yes, I was wrongly expecting a *text file*. But I still have a question: **how to open a non-text file as the one above?** – Robb1 Jan 06 '17 at 11:43
  • 1
    @Robb1, programmatically, you can open such a file with `open()`, `fopen()`, or similar, as you already did. If you're asking about opening it with other programs then there is a wide range of programs that can do so. Some programs might require special options or commands, but that's program-dependent. I used `vim` myself, with no special options, though `vim -b` would actually have been more appropriate. – John Bollinger Jan 06 '17 at 14:27
-2

On some (very old?) systems lseek will not allow you to seek past the end of the file, and if you attempt it, you'll get an EINVAL error.

Instead, you want to use ftruncate to change the file size first, and then use lseek to seek to where in the file you want to read (or write). For your example:

ftruncate(fd, 5);         // set file size to 5
lseek(fd, SEEK_END, 0);   // reposition to new end
write(fd, &v, sizeof(v);  // write data (extending the file)
Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • The lseek() function allows the file offset to be set beyond the end of the file (but this does not change the size of the file). If data is later written at this point, subsequent reads of the data in the gap (a "hole") return null bytes ('\0') until data is actually written into the gap. **(From linux lseek(2) man page)** – Dave M. Jan 05 '17 at 17:32
  • @DaveM. That's OS specific -- by POSIX, its not allowed to seek past the end. – Chris Dodd Jan 05 '17 at 17:35
  • 2
    @ChrisDodd, you are mistaken. POSIX in fact has wording very similar to the Linux manual's. See http://pubs.opengroup.org/onlinepubs/009695399/functions/lseek.html – John Bollinger Jan 05 '17 at 17:39
  • It seemed odd that such a basic thing would be different between Linux & POSIX; I'm glad my uneducated assumption wasn't so far off. But it begs the question: what system is OP using, that this common Linux/POSIX thing doesn't work? – Dave M. Jan 05 '17 at 17:42
  • I guess my age is showing -- as the link shows, this was changed way back in the 2004 POSIX standard. HP/UX is well and truly dead... – Chris Dodd Jan 05 '17 at 18:34