4

I want to read a CSV file in Java and sort it using a particular column. My CSV file looks like this:

 ABC,DEF,11,GHI....
 JKL,MNO,10,PQR....
 STU,VWX,12,XYZ....

Considering I want to sort it using the third column, my output should look like:

 JKL,MNO,10,PQR....
 ABC,DEF,11,GHI....
 STU,VWX,12,XYZ....

After some research on what data structure to use to hold the data of CSV, people here suggested to use Map data structure with Integer and List as key and value pairs in this question:

 Map<Integer, List<String>>
 where the value, List<String> = {[ABC,DEF,11,GHI....], [JKL,MNO,10,PQR....],[STU,VWX,12,XYZ....]...}
 And the key will be an auto-incremented integer starting from 0.

So could anyone please suggest a way to sort this Map using an element in the 'List' in Java? Also if you think this choice of data structure is bad, please feel free to suggest an easier data structure to do this.

Thank you.

Community
  • 1
  • 1
Srikanth Kandalam
  • 965
  • 3
  • 15
  • 26
  • The Map was a bad idea, I downvoted the answer in the link you gave. A List of List is the way to go here. – AlexWien Jul 14 '14 at 20:01
  • 1
    Note: when you compare Strings, `"100" < "2"` as `'1' < '2'` If you want a different behaviour you will need to change the Comparator. – Peter Lawrey Jul 14 '14 at 20:02
  • @AlexWien Could you please explain why? Is it because we cannot sort the Map easily? – Srikanth Kandalam Jul 14 '14 at 20:02
  • 1
    The List of List answers will work, but you need to be sure you create a list for each column instead of each row for the sorting to work. Your example shows the list to contain the data in each row. A little trickier to do as you read a CSV file but not too bad. – Jason Nesbitt Jul 14 '14 at 20:03
  • @PeterLawrey: Is it because "100", "2" etc are considered as String by default? – Srikanth Kandalam Jul 14 '14 at 20:03
  • 1
    A csv file is a sequence of lines without a key. An ArrayList is the perfect fit, it can access the line directly by giving the position = linenumber. A Map needs more time to finde the line, to search for key linenumber. Maps are used for other purposes. – AlexWien Jul 14 '14 at 20:09
  • @JasonNesbitt List of rows will work. you compare position 5 of row i with pos 5 of row j. The comparator takes as argumnet two rows (the arrayList of columns) – AlexWien Jul 14 '14 at 20:12
  • @AlexWien You are right. Thanks for pointing that out. Probably the better way to go since it would make reading the data in easier. – Jason Nesbitt Jul 14 '14 at 20:15

4 Answers4

4

I would use an ArrayList of ArrayList of String:

ArrayList<ArrayList<String>>

Each entry is one line, which is a list of strings. You initialize the list by:

List<ArrayList<String>> csvLines = new ArrayList<ArrayList<String>>();

To get the nth line:

List<String> line = csvLines.get(n);

To sort you write a custom Comparator. In the Constructor of that comparator you can pass the field position used to sort.

The compare method then gets the String value on stored position and converts it to a primitive ava type depending on the position. E.g you know that at position 2 in the csv there is an Integer, then convert the String to an int. This is neccessary for corretcly sorting. You may also pass an ArrayList of Class to the constructor such that it knows which field is what type.
Then use String.compareTo() or Integer.compare(), depending on column position etc.

Edit example of working code:

List<ArrayList<String>> csvLines = new ArrayList<ArrayList<String>>();
Comparator<ArrayList<String>> comp = new Comparator<ArrayList<String>>() {
    public int compare(ArrayList<String> csvLine1, ArrayList<String> csvLine2) {
        // TODO here convert to Integer depending on field.
        // example is for numeric field 2
        return Integer.valueOf(csvLine1.get(2)).compareTo(Integer.valueOf(csvLine2.get(2)));
    }
};
Collections.sort(csvLines, comp);
AlexWien
  • 28,470
  • 6
  • 53
  • 83
  • edited and changed the compare to work with numbers. – AlexWien Jul 14 '14 at 20:19
  • I am really new to this so when I tried your code, Eclipse points it as a type mismatch error saying 'change type of csvLines to List>'. So can you please let me know if I am doing it right? – Srikanth Kandalam Jul 14 '14 at 20:22
  • 1
    ah yes, use List = new ArrayList>, this is a weakness of java – AlexWien Jul 14 '14 at 20:25
  • I was trying to pass 'csvLines' which is of type List> to 'Collections.sort()' but since sort accepts only List and not List> as one of it's arguments. Could you please help me how to proceed? – Srikanth Kandalam Jul 15 '14 at 13:17
  • sort accpets the list to be sorted and the comparator.so Collections.sort(csvLines, MyCustomComparator); – AlexWien Jul 15 '14 at 14:00
  • List> csvLines = new ArrayList>(); Comparator> comp = new Comparator>() { public int compare(ArrayList o1, ArrayList o2) { // TODO Auto-generated method stub return 0; } }; Collections.sort(csvLines, comp); – AlexWien Jul 15 '14 at 14:04
  • The above comment shows an inlined Comparator, hoiwevcer i would recommend to create an MyCsvComparator class. – AlexWien Jul 15 '14 at 14:12
4

In Java 8 you can do

SortedMap<Integer, List<String>> collect = Files.lines(Paths.get(filename))
    .collect(Collectors.groupingBy(
                                l -> Integer.valueOf(l.split(",", 4)[2]), 
                                TreeMap::new, Collectors.toList()));

Note: comparing numbers as Strings is a bad idea as "100" < "2" might not be what you expect.

I would use a sorted multi-map. If you don't have one handy you can do this.

SortedMap<Integer, List<String>> linesByKey = new TreeMap<>();

public void addLine(String line) {
    Integer key = Integer.valueOf(line.split(",", 4));
    List<String> lines = linesByKey.get(key);
    if (lines == null)
         linesByKey.put(key, lines = new ArrayList<>());
    lines.add(line);
}

This will produce a collection of lines, sorted by the number where lines with duplicate numbers have a preserved order. e.g. if all the lines have the same number, the order is unchanged.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
0

You can also use a list of lists:

List<List<String>> Llp = new ArrayList<List<String>>();

Then you need to call sort that extends a custom comparator that compares the third item in the list:

    Collections.sort(Llp, new Comparator<LinkedList<String>>() {
            @Override
            public int compare(LinkedList<String> o1, LinkedList<String> o2) {
                try {                      
                    return o1.get(2).compareTo(o2.get(2));
                } catch (IndexOutOfBoundsException e) {
                    return 0;
                }
 }
ltalhouarne
  • 4,586
  • 2
  • 22
  • 31
  • 1
    Catching an NPE is not generally a good idea, esp as there is no way this can happen reading String from a file. – Peter Lawrey Jul 14 '14 at 20:01
  • The CSV file could be missing data for whatever reason, thus possibly creating a list of only 2 elements. – ltalhouarne Jul 14 '14 at 20:03
  • 1
    So you could get an IndexOutOfBoundsException ? – Peter Lawrey Jul 14 '14 at 20:08
  • 1
    Sorting numerical values as strings will result in `"10"` to be placed before `"2"`. EDIT, ah, I see Peter already mentioned this as a comment to the question, but I think it's worth mentioning here as well... – Bart Kiers Jul 14 '14 at 20:08
  • @lolkidoki 'Collections.sort()' cannot accept 'Llp' as an argument as 'Llp' is a List of Lists while the valid argument is just a List. Could you please correct me if I am wrong? – Srikanth Kandalam Jul 14 '14 at 20:46
0

In the below code I have sorted the CSV file based on the second column.


public static void main(String[] args) throws IOException {
    String csvFile = "file_1.csv";
    String line = "";
    String cvsSplitBy = ",";
    List<List<String>> llp = new ArrayList<>();
    try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
        while ((line = br.readLine()) != null) {
            llp.add(Arrays.asList(line.split(cvsSplitBy)));
        }
        llp.sort(new Comparator<List<String>>() {
            @Override
            public int compare(List<String> o1, List<String> o2) {
                return o1.get(1).compareTo(o2.get(1));
            }
        });
        System.out.println(llp);

    } catch (IOException e) {
        e.printStackTrace();
    }
}
Nandu Raj
  • 2,072
  • 9
  • 20
Anuj
  • 1
  • 2