4

What is the most efficient way (In terms of time) to read a text file into a list of array. File are of size 100 mb to 2 gb. The file contains data in following formatted :

From      TO          time     

a         b      13 decc 2009
b         c      13 decc 2009
c         d      13 decc 2009
f         h      13 decc 2009
f         g      13 decc 2009

Edit: Following is code for reading file

public List<InputDataBean> readInputData() throws Exception{
        List<InputDataBean> dataSet = new ArrayList<InputDataBean>();
        FileInputStream fstream = null;
        BufferedReader br = null;
        try{
            fstream = new FileInputStream(filePath);
            br = new BufferedReader(new InputStreamReader(fstream));
            String strLine;
            Set<String> users = new TreeSet<String>();
            while ((strLine = br.readLine()) != null)   {
                InputDataBean data = validateRecord(strLine);
                if(data==null)
                    continue;
                dataSet.add(data);
                users.add(data.getFromName());
                users.add(data.getToName());
            }
            UserKeys.setUsers(users);

        }catch (Exception e){
            throw e;
        }finally{
            try {
                if(null!=br)
                    br.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return dataSet;
    }

After reading file I want to store into array not to database.

If any other better alternative for reading file? Is it good idea to call script from java program and read data using script and store into java array.

P.S.: I really appreciate if anybody can edit or improve tags.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
Raje
  • 3,285
  • 15
  • 50
  • 70
  • Don't forget to try using something like ensureCapacity() – Mikhail Dec 13 '11 at 06:02
  • what are you doing with the data? if it goes to a database, you should use a tool that your database provides (most databases do). storing about 2 GB of data into the heap (as you read the file) may not be a great idea... generally, buffered readers are fine if you have to do this in java. – aishwarya Dec 13 '11 at 06:05
  • @aishwarya : I have added my file reading program in question. After reading data I want to store into array and perform some operation on this data. We are not storing data into database. – Raje Dec 13 '11 at 06:22
  • Firstly, how exactly are you reading your files ? There is no sample codes that could allow anyone to use as sample for suggestions. Secondly, what is your expected standard ? – thotheolh Dec 13 '11 at 06:01
  • There are some questions asked in stackoverflow regarding the parsing of tab delimited files in Java. I found one here: http://stackoverflow.com/questions/1635764/string-parsing-in-java-with-delimeter-tab-t-using-split – thotheolh Dec 13 '11 at 06:03
  • @thotheolh: Thanks for suggestion. sorry I want to read file using efficient way(In term of time) – Raje Dec 13 '11 at 06:14

1 Answers1

3

Possibly wrapping a BufferedInputStream around the FileInputStream will further improve performance a bit (because reads will be buffered in multiples of 4 KB). You could also play a bit with the buffer size.

If you know it's just ASCII, you could avoid using a Reader and possibly avoid creating a String for each line.

If you have the time, I would compare the performance of your solution with existing CSV reader tools, such as the CSV tool from the H2 database (disclosure: I wrote it).

Thomas Mueller
  • 48,905
  • 14
  • 116
  • 132