0

If there is an input file with tons of records, each record with be one line, each record is consisted of one id number, time the record is created and record content. Then what will be the best way to read and parse the file?

For example, the input is:

123-456-789   1:23pm Jan 4, 2014   I AM THE CONTENT!  
987-654-321   3:21pm Apr1, 2014    I AM THE CONTENT TOO!   
…  

To read one line each time, I believe there is no much difference between scanner and bufferedReader because scanner also has 1k buffer. So may I do:

Scanner scan = new Scanner(new File("filename"))?

Then after I get one line, should I make another scanner object to parse the line and get each field (I can give the line as the input for the scanner)? Or is there any other better solution?

For experienced programmer, what should be the best way (fast, better performance) to do read and parse such a file with tons of records in real world? Thank you!

Bruce Martin
  • 10,358
  • 1
  • 27
  • 38
lkkeepmoving
  • 2,323
  • 5
  • 25
  • 31
  • 1
    Why do you think you need new scanner objects? A 1k buffer doesn't mean you can only read files up to 1k in size, but that it will read only 1k of data at a time - from any file size. Just use one `Scanner` and parse a file of whatever size. Why not just try it? Don't optimize up front. – Zoltán Mar 09 '14 at 00:56
  • It looks like a fixed width file, you could look at the fixed width libraries (see http://stackoverflow.com/questions/1609807/whats-the-best-way-of-parsing-a-fixed-width-formatted-file-in-java) – Bruce Martin Mar 09 '14 at 01:40

1 Answers1

3

Unless 'tons' means hundreds of millions of lines it isn't likely to make any significant difference which you use, but you only need one Scanner object for this task, not one per line.

NB BufferedReader has a 4k buffer, so your only stated reason for thinking there is 'not much difference' is out the window. The fact that Scanner is a higher-level API with tokenising features also seems to have escaped you.

user207421
  • 305,947
  • 44
  • 307
  • 483