0

i am trying to read a huge file ( > 1GB) , i am thinking that reading it as a random access file with a buffered reader would be efficient.

i need to read the file line by line and parse it

However being new to JAVA IO Api , i'm not sure how can i do this..

i appreciate your help.

Gray
  • 115,027
  • 24
  • 293
  • 354
user1203861
  • 1,177
  • 3
  • 15
  • 19
  • You can only read a random byte, not a random character (as they can vary in length) What are you trying to do was these classes doesn't work well together? – Peter Lawrey Jul 26 '12 at 15:21
  • What are you trying to do? Do you need to read the entire file? Read something at a fixed offset in the middle of the file? Read something that you have to search for in the middle of the file? Are you just trying to make a read of the entire file "go faster"? Q: What exactly is the "problem" you're trying to resolve? – paulsm4 Jul 26 '12 at 15:23
  • i need to read the file line by line and parse it, i need it to be as fast as possible – user1203861 Jul 26 '12 at 15:29
  • 3
    `BufferedReader` and `RandomAccessFile` are completely orthogonal concepts. Buffered reader does character decoding (as do all `Reader`s) and buffers the input so that it can find line endings and thus give you whole lines at a time. Random access files are for reading from an arbitrary byte index in a file. What are you really trying to do? – Mark Peters Jul 26 '12 at 15:32
  • 1
    @user: if you want to read the file line by line, forget using random access; random access is only useful if you want to jump to a specific place in the file and avoid reading everything before it. You don't want that: you want to read every line in order. Just use a `BufferedReader`, of which there are many examples (e.g. http://stackoverflow.com/questions/2500107/how-should-i-read-from-a-buffered-reader) – Mark Peters Jul 26 '12 at 15:36
  • If you "need to read the file line by line", then you want a buffered reader :) – paulsm4 Jul 26 '12 at 16:38
  • PS: If you're doing a lot of processing on the string you read, you probably also want to use StringBuilder (you *don't* want to do a lot of processing on "String"). IMHO... – paulsm4 Jul 26 '12 at 16:39
  • If you are dealing with a variable length file where data keeps coming (ex log file) i wrote one for me and its here http://stackoverflow.com/a/19867481/1282907 – srikanth yaradla Dec 13 '13 at 10:12

3 Answers3

3

You can use Java's BufferedReader for this:

BufferedReader reader = new BufferedReader(new FileReader(fileName));
String line;
while ((line = reader.readLine()) != null) {
  // Do some stuff with the line
}

fileName is the path to the file you want to read.

seh
  • 14,999
  • 2
  • 48
  • 58
jayeff
  • 148
  • 9
  • 1
    Note that `FileReader` assumes that the given file is encoded with the default character encoding. There's no way to tell it otherwise. – seh Jul 26 '12 at 15:58
0

Do you need to read all of it and from the beginning? You can use a RandomAccessFile to jump to different parts of the file if you know what byte you can start at. I think it is the seek function that does this.

RNJ
  • 15,272
  • 18
  • 86
  • 131
-1

While it is perfectly doable in java, I wanted to suggest based on my experience:

If you're on Unix platform, you may use external shell script for searching through the GBs of log. sed is very optimum for this purpose. Specific usage here: http://www.grymoire.com/Unix/Sed.html

Call shell script through java file whenever you need to read/grep through the log file.

How?

1) In your java code, use ProcessBuilder class. It can take shell script as arg to constructor

ProcessBuilder obj = new ProcessBuilder("FastLogRead.sh");

2) Create object for Process

Process process = obj.start();

3) You can read the output of this shell, directly in your BufferedRead through this

BufferedReader br=new BufferedReader(new InputStreamReader(process.getInputStream()));

Pros:

Speeds up execution by avg. 10 times (I searched through around 4GB log file)

Cons:

Some developers don't like bringing in light-weight shell script in realms of java, hence want to go for java's RandomAccessFile. This is justified.

For your case, you may choose between standardization and performance.

Vishal Verma
  • 962
  • 8
  • 18