5

I'm trying to figure out what the best approach would be to parse a csv file in Java. Now each line will have an X amount of information. For example, the first line can have up to 5 string words (with commas separating them) while the next few lines can have maybe 3 or 6 or what ever.

My problem isn't reading the strings from the file. Just to be clear. My problem is what data structure would be best to hold each line and also each word in that line?

At first I thought about using a 2D array, but the problem with that is that array sizes must be static (the 2nd index size would hold how many words there are in each line, which can be different from line to line).

Here's the first few lines of the CSV file:

0,MONEY
1,SELLING
2,DESIGNING
3,MAKING
DIRECTOR,3DENT95VGY,EBAD,SAGHAR,MALE,05/31/2011,null,0,10000,07/24/2011
3KEET95TGY,05/31/2011,04/17/2012,120050
3LERT9RVGY,04/17/2012,03/05/2013,132500
3MEFT95VGY,03/05/2013,null,145205
DIRECTOR,XKQ84P6CDW,AGHA,ZAIN,FEMALE,06/06/2011,null,1,1000,01/25/2012
XK4P6CDW,06/06/2011,09/28/2012,105000
XKQ8P6CW,09/28/2012,null,130900
DIRECTOR,YGUSBQK377,AYOUB,GRAMPS,FEMALE,10/02/2001,12/17/2007,2,12000,01/15/2002
Raedwald
  • 46,613
  • 43
  • 151
  • 237
user3108505
  • 77
  • 1
  • 5
  • 11
  • You are looking for a data structure for each line, but the lines seem to represent different types of information. E.g. "0,MONEY" is quite possibly not the same type of information as "DIRECTOR,...". If the information itself is not of the same type, finding a single data structure that meaningfully holds these bits of information seems not-too-sensible. – Chthonic Project Feb 08 '14 at 06:05
  • @ChthonicProject Correct, although most CSV files can get filled with headers or footers. Or at least, it's not uncommon where I'm at. – mrres1 Feb 08 '14 at 06:09
  • @ChthonicProject What would you suggest I'd rather do? I was gonna use a universal data structure to put each information in separate container classes ex: Money would go to a money container, and Director would go to a director container. – user3108505 Feb 08 '14 at 06:12
  • You could use a `Map>`. The keys being the line numbers in the csv file, and the `List` being the words in each line. – Chthonic Project Feb 08 '14 at 06:15
  • @ChthonicProject Oh my god. That is such a simple and excellent idea. Please put that in as an answer so I can choose that. – user3108505 Feb 08 '14 at 06:17

4 Answers4

4

You could use a Map<Integer, List<String>>. The keys being the line numbers in the csv file, and the List being the words in each line.

An additional point: you will probably end up using List#get(int) method quite often. Do not use a linked list if this is the case. This is because get(int) for linked list is O(n). I think an ArrayList is your best option here.

Edit (based on AlexWien's observation):

In this particular case, since the keys are line numbers, thus yielding a contiguous set of integers, an even better data structure could be ArrayList<ArrayList<String>>. This will lead to faster key retrievals.

Chthonic Project
  • 8,216
  • 1
  • 43
  • 92
  • Why an Map? Why notjust an ArrayList> – AlexWien Jul 14 '14 at 20:00
  • A map is better suited for random line retrievals. If that is not a satisfactory answer, one could just as easily ask the converse: why an ArrayList> instead of a Map? – Chthonic Project Jul 16 '14 at 04:18
  • an arraylist accesses any line in one step. like an array element[i]. one also write O(1), while a map has to search the element , and needs about O(log N). A map is usefull for random keys, like telefonnumbers or names, but not for a complete ordered key set without holes,like numbers from 0 to N as we have in line numbers. – AlexWien Jul 16 '14 at 09:18
  • Good point! Hadn't realized that we are dealing with a set without holes. Updated the answer. – Chthonic Project Jul 17 '14 at 00:20
3

Use Array List. They are arrays with dynamic size.

Akshat Singhal
  • 1,801
  • 19
  • 20
2

The best way is to use a CSV parser, like http://opencsv.sourceforge.net/. This parser uses List of String[] to hold data.

Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275
  • I like that link, it's useful, but it didn't answer my question. I mentioned that reading from a CSV file isn't my problem. My problem is trying to figure out what would be the best data structure to hold the words and also know the index of each word. – user3108505 Feb 08 '14 at 06:03
  • opencsv uses List of String[] – Evgeniy Dorofeev Feb 08 '14 at 06:07
0

Use a List<String>(), which can expand dynamically in size.

If you want to have 2 dimensions, use a List<List<String>>().

Here's an example:

List<List<String>> data = new ArrayList<List<String>>();
List<String> temp = Arrays.asList(someString.split(","));
data.add(temp);

put this in some kind of loop and get your data like that.

iptq
  • 657
  • 1
  • 7
  • 15