0

I need manipulate a large file that cannot fit into memory. My code involves a lot of read and write and my file only contains integers. Right now I am using

DataInputStream in = new DataInputStream(new BufferedInputStream(
            new FileInputStream(inPath)));
int i = in.readInt();

and

DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
            new FileOutputStream(outPath)));
dos.writeInt(i);

for reading and writing integers.

However, having to constantly reading and writing leads to a really bad performance. After profiling my code, I found that most of time is spent on readInt() and writeInt(). How could improve the performance of reading and writing integers?

Terence Lyu
  • 319
  • 1
  • 3
  • 13

2 Answers2

0

When dealing with large amounts of data, file IO is very often a crucial bottleneck for performance.

Option space to overcome that problem is pretty wide:

  • when using a single machine, you might distribute your data on multiple disks, so that IO traffic isn't stalled by all requests going to a single device
  • obviously: faster disk hardware (SSDs, or NVMe)
  • scale out: not only multiple disks, but also multiple compute node
  • opening up even more dimensions, like: network file systems, or file systems especially optimised for dealing with large data

These ideas are pretty generic, but so is your question. Don't expect the perfect solution, as "perfect" solutions are created by carefully designing an overall architecture that is able to do what you need, followed by even more careful tuning of all the relevant settings in that setup.

GhostCat
  • 137,827
  • 25
  • 176
  • 248
-1

Try a BufferedDataInputStream. There are many implements on the internet, such as https://github.com/nom-tam-fits/nom-tam-fits/blob/master/src/main/java/nom/tam/util/BufferedDataInputStream.java

vipcxj
  • 840
  • 5
  • 10