5

how do I load a file into main memory?

I read the files using, I use

BufferReader buf = new BufferedReader(FileReader());

I presume that this is reading the file line by line from the disk. What is the advantage of this?

What is the advantage of loading the file directly into memory? How do we do that in Java?

I found some examples on Scanner or RandomAccessFile methods. Do they load the files into memory? Should I use them? Which of the two should I use ?

Thanks in advance!!!

Paul Vargas
  • 41,222
  • 15
  • 102
  • 148
  • 2
    What does your [profiler](http://stackoverflow.com/q/2064427/230513) say? – trashgod Oct 27 '12 at 02:01
  • Where do you think your heap is? ("Load the file into memory" is a meaningless expression.) – Hot Licks Oct 27 '12 at 02:25
  • I don't have a profiler. I run the program on hadoop cluster and i monitor using cygwin. I want a way to load the file directly into memory instead of reading it line by line from disk. I think heap is dynamic memeory allocation. apart from that i dont have an idea about it. pls help! – Mahalakshmi Lakshminarayanan Oct 27 '12 at 02:35
  • What do you intend to do with the file after you "load" it? How big is it? – Hot Licks Oct 27 '12 at 02:39
  • I want to read selected data from it. I dont know the size of the file. I use it in hadoop reduce class. I think for small data sets, the file will be small enough to fit in memory. I observed felt that reading the lines of the files one by one from disk is making the program run slow, so I wanted to load it into memory read it line by line and extract the wanted information from it. – Mahalakshmi Lakshminarayanan Oct 27 '12 at 03:05

2 Answers2

7
BufferReader buf = new BufferedReader(FileReader());

I presume that this is reading the file line by line from the disk. What is the advantage of this?

Not exactly. It is reading the file in chunks whose size is the default buffer size (8k bytes I think).

The advantage is that you don't need a huge heap to read a huge file. This is a significant issue since the maximum heap size can only be specified at JVM startup (with Hotspot Java).

You also don't consume the system's physical / virtual memory resources to represent the huge heap.

What is the advantage of loading the file directly into memory?

It reduces the number of system calls, and may read the file faster. How much faster depends on a number of factors. And you have the problem of dealing with really large files.

How do we do that in Java?

  1. Find out how large the file is.
  2. Allocate a byte (or character) array big enough.
  3. Use the relevant read(byte[], int, int) or read(char[], int, int) method to read the entire file.

You can also use a memory-mapped file ... but that requires using the Buffer APIs which can be a bit tricky to use.

I found some examples on Scanner or RandomAccessFile methods. Do they load the files into memory?

No, and no.

Should I use them? Which of the two should I use ?

Do they provide the functionality that you require? Do you need to read / parse text-based data? Do you need to do random access on a binary data?

Under normal circumstances, you should chose your I/O APIs based primarily on the functionality that you require, and secondarily on performance considerations. Using a BufferedInputStream or BufferedReader is usually enough to get acceptable* performance if you intend to parse it as you read it. (But if you actually need to hold the entire file in memory in its original form, then a BufferedXxx wrapper class actually makes reading a bit slower.)


* - Note that acceptable performance is not the same as optimal performance, but your client / project manager probably would not want your to waste time writing code to perform optimally ... if this is not a stated requirement.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • I need to read and parse the file. I amusing this for hadoop map reduce program. I am trying to read the files from the disk using buffered reader. But this seems to take a lot of time. So, I was wondering that may be I should load the entire file into memory, which may improve performance. – Mahalakshmi Lakshminarayanan Oct 27 '12 at 04:04
  • You need to profile your application to find out exactly where it is spending its time in the reading / parsing. – Stephen C Oct 27 '12 at 05:01
  • 1
    Note that if your intent is to read in the entire file without parsing, a Buffered wrapper will only add an extra copy to the operation. However, if you're reading the file, parsing it, then never again referencing the file, you *want* a buffered reader, and reading the entire file at once is probably a bad idea. – Hot Licks Oct 27 '12 at 13:14
4

If you're reading in the file and then parsing it, walking from beginning to end once to extract your data, then not referencing the file again, a buffered reader is about as "optimal" as you'll get. You can "tune" the performance somewhat by adjusting the buffer size -- a larger buffer will read larger chunks from the file. (Make the buffer a power of 2 -- eg 262144.) Reading in an entire large file (larger than, say, 1mb) will generally cost you performance in paging and heap management.

Hot Licks
  • 47,103
  • 17
  • 93
  • 151