0

In Java, having a file of 335Gb size that contains individual numbers at each line, I need to read it line by line like if it was a stream of numbers - I must not keep all the data in memory. I was told that Scanner class will not work. Could you please recommend the best possible way to do that?

2 Answers2

3

None of the java.io input stream classes would "keep all the data in memory". I think you are free to choose what is best for you such as BufferedReader or DataInputStream etc.

Joe
  • 31
  • 1
1

If you use BufferedReader you should be able to get up to 90 MB/s in one thread.

You can use trick to break up the file and read portion of the data concurrently, but this will only help if your disk read through put is high.

For example you can memory map 335 GB into memory at once without using the heap much. This will work even if you have a fraction of this amount of main memory.

What is the read transfer rate you can get with your disk subsystem?

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Why the reference to the hard number of 90 MB/s? My system surely allows more, others may be slower. I doubt that any trick will accelerate a tasks as simple as described. – Holger Feb 06 '15 at 17:52
  • @Holger The 90 MB/s is for a typical fast processor. If there is spare read throughput capacity, using memory mapped files and multiple threads can help reach your maximum read throughput. e.g. I have exceeded 1.2 GB/s using SSD and memory mapped files. – Peter Lawrey Feb 06 '15 at 17:55
  • Multi-threading is unlikely to accelerate I/O which goes serially though one bus. If you are talking about 1.2GB/s then parsing the numbers in parallel might indeed improve the though-put but that’s actually proving that on your system the I/O is *not* the bottleneck. So I don’t believe that the same system only allows 90MB/s when using `BufferedReader`… – Holger Feb 06 '15 at 18:02