18

What would be the most efficient method of reading a text file into a dynamic one-dimensional array? reallocing after every read char seems silly, reallocing after every read line doesn't seem much better. I would like to read the entire file into the array. How would you do it?

skaffman
  • 398,947
  • 96
  • 818
  • 769
diminish
  • 347
  • 3
  • 5
  • 9
  • I might have misuderstood what you want to do: Do you want to just read the whole file into a big buffer, or do you want an array with an entry for each line? – Christoph Jan 04 '09 at 17:05

3 Answers3

26

I don't understand quite what you want. Do you want to incrementally process the file, reading one line from it, then abandon it and process the next? Or do you want to read the entire file into a buffer? If you want the latter, I think this is appropriate (check for NULL return for malloc and fopen in real code for whether the file exist and whether you got enough memory):

FILE *f = fopen("text.txt", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);

char *bytes = malloc(pos);
fread(bytes, pos, 1, f);
fclose(f);

hexdump(bytes); // do some stuff with it
free(bytes); // free allocated memory
Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • Yes, that would apply to my case. I meant that using realloc after each read char seems very inefficient, similarly after every read \n (to extend the array). – diminish Jan 04 '09 at 13:15
  • You should open the file in binary mode - there might be problems otherwise (check eg. glibc manual, 12.17) – Christoph Jan 04 '09 at 13:40
  • oh, thanks. i had no idea that it makes *that* much of a difference. – Johannes Schaub - litb Jan 04 '09 at 14:59
  • On POSIX systems, it shouldn't. But I'm pretty sure I once stumbled upon a bug which went away after switching to binary mode - but I can't remember what the exact issue was... – Christoph Jan 04 '09 at 15:33
  • Re: binary-mode - Control-Z can cause trouble on Windows. General point: You could consider using 'stat()' or 'fstat()' to tell you the file size. Also, beware gargantuan files (larger than 2 GB); long may not work reliably. – Jonathan Leffler Jan 04 '09 at 16:58
  • The stat functions are not part of the C standard; fseek()/ftell() is the only way I know of to get the size of a file if you want to use ISO C. – Christoph Jan 04 '09 at 17:20
  • i'm also unaware of any other way to get the filesize in standard C. But i seriously doubt he's loading a whole file with 2^32 bytes in memory using that method – Johannes Schaub - litb Jan 04 '09 at 17:22
  • 1
    hi, what is the difference between (let's assume we use 100 instead of pos) char *bytes = malloc(100*sizeof(char)); and above line where you have written char *bytes = malloc(100); second question is that what if my file has 180205962 characters in it. will the above way of reading the file would be efficient? – asel Dec 30 '09 at 20:56
  • 1
    @asel, first question: `sizeof(char)` is defined to be 1, so there is no difference. Second question: no, you probably should read it incrementally (like, line-by-line, or some other piecewise method). Otherwise, your memory will quickly become exhausted. – Johannes Schaub - litb Dec 30 '09 at 21:37
  • 1
    Using fseek/ftell to get the file's size is insecure. See this CERT reference for why that is and how to do it securely: https://www.securecoding.cert.org/confluence/display/seccode/FIO19-C.+Do+not+use+fseek()+and+ftell()+to+compute+the+size+of+a+file – Bryan Dec 09 '12 at 23:31
  • @Bryan: Thank you for that link. p.s.: that page has apparently moved to https://www.securecoding.cert.org/confluence/display/c/FIO19-C.+Do+not+use+fseek%28%29+and+ftell%28%29+to+compute+the+size+of+a+regular+file – David Cary Jan 03 '16 at 20:14
12

If mmap(2) is available on your system, you can open the file and map it into memory. That way, you have no memory to allocate, you even don't have to read the file, the system will do it. You can use the fseek() trick litb gave to get the size.

void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);

EDIT: You have to use lseek() to obtain the size of the file, .

int fd = open("filename", O_RDONLY);
int nbytes = lseek(fd, 0, SEEK_END);
void *content = mmap(NULL, nbytes, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
saffsd
  • 23,742
  • 18
  • 63
  • 67
philant
  • 34,748
  • 11
  • 69
  • 112
  • @saffsd you have enough rep to fix it, you know how it works here. – philant Sep 05 '14 at 06:33
  • forgot about that, fixed and deleted comment. – saffsd Sep 06 '14 at 08:34
  • 1
    A possibly more idiomatic way to get the file size is to use `fstat(2)` function: `struct stat S; fstat(fd, &S);`, then `int nbytes = S.st_size` is the file size in bytes, direct from the filesystem, without any reads of the file (this would doubtless get the same result as above; I mention it largely for completeness). – Norman Gray Oct 28 '15 at 22:13
1

If you want to use ISO C, use this function.

It's litb's answer, wrapped with some error handling...

Community
  • 1
  • 1
Christoph
  • 164,997
  • 36
  • 182
  • 240