11

Using C, is there a way to read only the last line of a file without looping it's entire content?

Thing is that file contains millions of lines, each of them holding an integer (long long int). The file itself can be quite large, I presume even up to 1000mb. I know for sure that the last line won't be longer than 55 digits, but could be 2 only digits as well. It's out of options to use any kind of database... I've considered it already.

Maybe its a silly question, but coming from PHP background I find it hard to answer. I looked everywhere but found nothing clean.

Currently I'm using:

if ((fd = fopen(filename, "r")) != NULL) // open file
{
    fseek(fd, 0, SEEK_SET); // make sure start from 0
    while(!feof(fd))
    {
        memset(buff, 0x00, buff_len); // clean buffer
        fscanf(fd, "%[^\n]\n", buff); // read file *prefer using fscanf
    }
    printf("Last Line :: %d\n", atoi(buff)); // for testing I'm using small integers
}

This way I'm looping file's content and as soon as file gets bigger than ~500k lines things slow down pretty bad....

Thank you in advance. maxim

simonc
  • 41,632
  • 12
  • 85
  • 103
MAXIM
  • 1,223
  • 1
  • 9
  • 16

4 Answers4

7

Just fseek to fileSize - 55 and read forward?

Reunanen
  • 7,921
  • 2
  • 35
  • 57
5

If there is a maximum line length, seek to that distance before the end. Read up to the end, and find the last end-of-line in your buffer.

If there is no maximum line length, guess a reasonable value, read that much at the end, and if there is no end-of-line, double your guess and try again.

In your case:

/* max length including newline */
static const long max_len = 55 + 1;
/* space for all of that plus a nul terminator */
char buf[max_len + 1];

/* now read that many bytes from the end of the file */
fseek(fd, -max_len, SEEK_END);
ssize_t len = read(fd, buf, max_len);

/* don't forget the nul terminator */
buf[len] = '\0';

/* and find the last newline character (there must be one, right?) */
char *last_newline = strrchr(buf, '\n');
char *last_line = last_newline+1;
Useless
  • 64,155
  • 6
  • 88
  • 132
  • ok, got it. thanks. There might be more than one "\n" within the 55 buffer as it is not guaranteed that all the lines are 55 char long, they can 2 or 20, but no longer than 55. So at least one is there for sure. But that's the least of my problems. fseek() is exactly what I was looking for. Thank you! – MAXIM Dec 09 '12 at 19:18
  • I'm getting an error: invalid conversion from 'FILE* {aka _iobuf*}' to 'int' [-fpermissive] on line: ssize_t len = read(fd, buff, max_len); Any thoght what I could be doing wrong? – MAXIM Dec 09 '12 at 19:36
  • ah, use `size_t len = fread(buf, max_len, 1, fd)` instead: you have a `FILE*` but I wrote it as if you were using an integer file descriptor (often called `fd`). – Useless Dec 09 '12 at 20:07
3

Open with "rb" to make sure you're reading binary. Then fseek(..., SEEK_END) and start reading bytes from the back until you find the first line separator (if you know the maximum line length is 55 characters, read 55 characters ...).

Alexander Gessler
  • 45,603
  • 7
  • 82
  • 122
  • If I understood you correctly, then I read it from the back for 55 char, then as soon as I meet "\n" I break and reverse what's read backwards (since it'll be the other way around I suppose). – MAXIM Dec 09 '12 at 19:12
  • No need to read character-by-character ... just read the last 56 chars and find the last newline. – Useless Dec 09 '12 at 19:17
2

ok. It all worked for me. I learned something new. The last line of a file 41mb large and with >500k lines was read instantly. Thanks to you all guys, especially 'Useless' (love the controversy of your nickname, btw). I will post here the code in the hope that someone else in the future can benefit from it:

Reading ONLY the last line of the file:

the file is structured the way that there is a new line appended and I am sure that any line is shorter than, in my case, 55 characters:

file contents:
------------------------
2943728727
3129123555
3743778
412912777
43127787727
472977827

------------------------

notice the new line appended.

FILE *fd;                           // File pointer
char filename[] = "file.dat";       // file to read
static const long max_len = 55+ 1;  // define the max length of the line to read
char buff[max_len + 1];             // define the buffer and allocate the length

if ((fd = fopen(filename, "rb")) != NULL)  {      // open file. I omit error checks

    fseek(fd, -max_len, SEEK_END);            // set pointer to the end of file minus the length you need. Presumably there can be more than one new line caracter
    fread(buff, max_len-1, 1, fd);            // read the contents of the file starting from where fseek() positioned us
    fclose(fd);                               // close the file

    buff[max_len-1] = '\0';                   // close the string
    char *last_newline = strrchr(buff, '\n'); // find last occurrence of newlinw 
    char *last_line = last_newline+1;         // jump to it

    printf("captured: [%s]\n", last_line);    // captured: [472977827]
}

cheers! maxim

MAXIM
  • 1,223
  • 1
  • 9
  • 16