2

What is the simplest way to read a 10GB binary file and parse each byte without resorting to boost libraries?

Its so confusing with streams, files, memory mapped files etc.

I literally just want something like:

char* buffer = read(filename, binary);

while(buffer != EOF){
    //Read byte
    ++buffer;
}

Performance does matter simply due to the file size.

intrigued_66
  • 16,082
  • 51
  • 118
  • 189
  • What's wrong with std::ifstream? – Robert Prévost Sep 24 '16 at 01:32
  • For simple chunks i always fall back to C and unbuffered FILE* handling. – Lothar Sep 24 '16 at 01:36
  • 2
    memory mapped file is the best. – Jichao Sep 24 '16 at 01:46
  • @Jichao: Why do you say memory mapped file is the best? – John Zwinck Sep 24 '16 at 01:48
  • @Jichao Define 'best', and state your reasons. Memory mapping this file is a way to waste 10GB of virtual memory when reading 4-8k at a time is possibly quite adequate. – user207421 Sep 24 '16 at 01:49
  • According to [my experience](http://stackoverflow.com/questions/2171625/how-to-scan-through-really-huge-files-on-disk), maybe memory mapped file is the fastest way to do this. – Jichao Sep 24 '16 at 01:53
  • @Jichao (1) The OP specifically said performance wasn't a concern; (2) does your experience include 10GB files? – user207421 Sep 24 '16 at 01:56
  • 1
    *"Its so confusing with streams, files, memory mapped files etc."* - You will have to try it yourself and bechmark them, then use the fastest in your platform. Libraries, OS, compilers, Memories{Disks, SSDs, RAM etc}, DMA implementations etc are still evolving in ways that makes certain things faster... We are no longer in the days where we can arbitrarily tell you which is fastest without seeing actual implementations – WhiZTiM Sep 24 '16 at 02:00
  • @EJP (1) `Performance does matter` (2) for sequential access, memory mapped files should be the fastest way. – Jichao Sep 24 '16 at 02:01

2 Answers2

1

If you want good performance for sequential access (reading from the beginning toward the end), use fread(). You can store the FILE* in a std::shared_ptr for RAII:

std::shared_ptr<FILE> file(fopen(...), fclose);

You can ignore C++ streams, memory mapped files, Boost, etc. None of that will be faster than fread().

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • 8
    *"None of that will be faster than `fread()`"* - We need evidence for this kind of [ipsedixitism](https://en.wikipedia.org/wiki/Ipse_dixit). Got some numbers to back up your statement? – WhiZTiM Sep 24 '16 at 01:52
  • 1
    @WhiZTiM: I have tested these things many times. Anyone who wants to be 100% sure about their particular use case on their particular platform is welcome to write their own benchmarks. – John Zwinck Sep 24 '16 at 08:28
0

As you aren't concerned with performance,
a simple while loop with ifstream can extract one byte at a time:

#include <iostream>
#include <fstream>

int main(){

  std::ifstream infile("file.txt");

  while (infile){

    //get next byte
    char c;
    infile.get(c);

    //process byte
    std::cout << c;
  }

}
Trevor Hickey
  • 36,288
  • 32
  • 162
  • 271
  • 1
    Surely you would not read a 10 GB input file one byte at a time. – John Zwinck Sep 24 '16 at 01:49
  • @JohnZwinck Why not? The code that's doing the actual I/O for you knows better and will read efficiently-sized chunks as needed unless you explicitly tell it not to do that. I can imagine plenty of applications that would demand reading byte by byte. – Carey Gregory Sep 24 '16 at 01:58
  • @xaxxon Words. Does "best algorithm" suit you better? – Carey Gregory Sep 24 '16 at 02:19