3

I have very large csv files that I'm trying to iterate through. I'm using opencsv and I'd like to use CsvToBean so that I can dynamically set the column mappings from a database. The question I have is how to do this without grabbing the entire file and throwing it into a list. I'm trying to prevent memory errors.

I'm currently passing the entire result set into a list like so.

List<MyOption> myObjects = csv.parse(strat, getReader("file.txt"));

for (MyObject myObject : myObjects) {
    System.out.println(myObject);
}

But I found this iterator method and I'm wondering if this will just iterate each row rather than the entire file at once?

Iterator myObjects = csv.parse(strat, getReader("file.txt")).iterator();

while (myObjects.hasNext()) {
    MyObject myObject = (MyObject) myObjects.next();
    System.out.println(myObject);
}

So my question is what is the difference between Iterator and list?

Mukesh Singh Rathaur
  • 12,577
  • 2
  • 23
  • 24
Code Junkie
  • 7,602
  • 26
  • 79
  • 141
  • possible duplicate of [List vs List iterator](http://stackoverflow.com/questions/8411302/list-vs-list-iterator) – Subodh Joshi Jul 21 '15 at 05:51
  • http://stackoverflow.com/questions/2113216/which-is-more-efficient-a-for-each-loop-or-an-iterator – Subodh Joshi Jul 21 '15 at 05:52
  • 1
    either way, CsvToBean will always parse the entire file into a list and return that (according to source I found on the google). If you want to process an arbitrarily large file you will want a parser that reads one line at time, returning one bean at time. – slipperyseal Jul 21 '15 at 06:39
  • Reading a large csv file at once is not a good solution. Best way to read the csv file in chunks. You can have multiple threads one to read the data from the file and few other threads to perform the business logic. More details to read CSV data in chunks is here [How to parse chunk by chunk a large CSV file and bulk insert to a database](http://www.codeproject.com/Articles/543789/How-to-parse-chunk-by-chunk-a-large-CSV-file-and-b) and have multiple threds solution [here](https://stackoverflow.com/questions/11098873/how-to-split-a-csv-file-into-multiple-chunks-and-read-those-chunks-in-parallel – Mukesh Singh Rathaur Jul 21 '15 at 05:57

2 Answers2

1

The enhanced for loop (for (MyObject myObject : myObjects)) is implemented using the Iterator (it requires that the instance returned by csv.parse(strat, getReader("file.txt")) implements the Iterable interface, which contains an iterator() method that returns an Iterator), so there's no performance difference between the two code snippets.

P.S

In the second snippet, don't use the raw Iterator type, Use Iterator<MyObject> :

Iterator<MyObject> myObjects = csv.parse(strat, getReader("file.txt")).iterator();

while (myObjects.hasNext()) {
    MyObject myObject = myObjects.next();
    System.out.println(myObject);
}
Eran
  • 387,369
  • 54
  • 702
  • 768
  • So by the sounds of it I'd need to use there iterator method and implement my own CSVToBean. – Code Junkie Jul 21 '15 at 05:53
  • Thanks for the tip, but it doesn't look like using iterator is going to resolve my memory issues :/ – Code Junkie Jul 21 '15 at 05:54
  • @CodeJunkie The question is whether the `csv` instance you are using can supply an `Iterator` that doesn't require creation of a List first (since creation of a List requires reading all the data in advance). Such an Iterator (if exists) may read data from the file on demand (when you call the `hasNext()` or `next()` method). – Eran Jul 21 '15 at 05:58
1

"what is the difference between Iterator and list?"

A List is a data structure that gives the user functionalities like get(), toArray() etc.

An iterator only can allow the user to navigate through a data-structure provided the data structure implements Iterator interface (which all the data structures do)

so List<MyOption> myObjects = csv.parse(strat, getReader("file.txt")); physically stores the data in myObjects

and Iterator myObjects = csv.parse(strat, getReader("file.txt")).iterator(); just uses the iterator functionality of csv.parse

Anuswadh
  • 542
  • 1
  • 11
  • 19