2

I have a big file with real numbers, and I need to read them into doubles the fastest possible way.

I can chose the format of the file myself ( 1 number a line, or multiple for example)

tried scanf, it seems slow, tried gets(s) and then parsing doubles myself it gives about the same time.

Is there a faster way?

Herokiller
  • 2,891
  • 5
  • 32
  • 50
  • Do you need to write them as text? Can you write the raw bytes? – Cornstalks Jul 28 '16 at 03:57
  • 2
    Let's be clear. If you have lines, you have ASCII, and if you have ASCII you don't have doubles in the file: you have *real numbers* in ASCII format. Doubles are the result of a conversion. – user207421 Jul 28 '16 at 03:57
  • @Cornstalks I don't need to write them, I need to use them as doubles afterwards – Herokiller Jul 28 '16 at 03:59
  • @EJP I need to use them as doubles after reading – Herokiller Jul 28 '16 at 03:59
  • I'd read them as binary if you can. Text parsing is slow. – Retired Ninja Jul 28 '16 at 04:03
  • 1
    @Herokiller You missed Cornstalks' point. If you have freedom of choosing format, choose binary. You will get better data-per-bit ratio, i.e. better use of IO, and you will get rid of string->double conversion overhead. That's a bump in both IO and CPU, so you're practically guaranteed to get some speed-up. – luk32 Jul 28 '16 at 04:03
  • The fastest way would be to make the file format identical to the in-memory format (i.e a series of 64-bit binary floating point values in the CPUs native-endian format) and just mmap() that file into memory. Then you can treat the file as if it was an in-memory array, and the OS will handle reading it off of the disk as necessary. (Try to access its contents in sequential order as much as possible for best performance, though) – Jeremy Friesner Jul 28 '16 at 05:47
  • I find the istream >> operator neet, though I´m not sure how fast it is compared to other solutions. I think it beats scanf as the format string no longer has to be parsed but you have to benchmark to be sure. Try optimizing through the compiler first. Multithreading (pthreads or OpenCL) or SIMD could make it "faster" depending on your platform. – Andreas Jul 28 '16 at 09:07
  • [Why is the gets function so dangerous that it should not be used?](http://stackoverflow.com/q/1694036/995714) – phuclv May 04 '17 at 10:27

1 Answers1

1

Try to read from raw binary files. This is the fastest.

Example: if you want to fill 1000 values for two arrays x and y then the data in the binary file should contain x-values from slot 0-1000 and y-values from slot 1000-2000. The command is fread. Do not forget that the size of your element is here 64 bits = 8 bytes for type double.

Armen Avetisyan
  • 1,140
  • 10
  • 29