3

How to sort CSV file by two columns? Right now I am able to sort it by one column. I need to sort it by first two columns. How to do it? Here is the code that I am using for sorting it by its first column:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

public class Practice {
    public static void main(String[] args) throws Exception {
        BufferedReader reader = new BufferedReader(new FileReader("sample-input.csv"));
        Map<String, List<String>> map = new TreeMap<String, List<String>>();
        String line = reader.readLine();//read header
        while ((line = reader.readLine()) != null) {
            String key = getField(line);
            List<String> l = map.get(key);
            if (l == null) {
                l = new LinkedList<String>();
                map.put(key, l);
            }
            l.add(line);
        }

        reader.close();
        FileWriter writer = new FileWriter("output.csv");
        writer.write("Symbol, Exchange, Minimum, Average, Maximum, Total\n");
        for (List<String> list : map.values()) {
            for (String val : list) {
                writer.write(val);
                writer.write("\n");
            }
        }
        writer.close();
    }

    private static String getField(String line) {
        return line.split(",")[0];
        // extract value you want to sort on
    }
}

EDIT: The output after two column sort becomes like:

ABC,X,0.10,10
ABC,X,0.09,20
ABC,X,0.11,10
ABC,X,0.11,20
ABC,X,0.10,10
ABC,Y,0.09,10
ABC,Y,0.08,10
ABC,Z,0.12,15
ABC,Z,0.10,15
DEF,X,0.17,10
DEF,X,0.14,10
DEF,Y,0.15,15
DEF,Y,0.15,15
DEF,Y,0.17,15
DEF,Y,0.16,15
DEF,Y,0.17,15
DEF,Z,0.14,10
DEF,Z,0.15,10

I need the output like this:

ABC,X,0.09,0.11
ABC,X,0.09,0.11
ABC,X,0.09,0.11
ABC,X,0.09,0.11
ABC,X,0.09,0.11
ABC,Y,0.08,0.9
ABC,Y,0.08,0.9
ABC,Z,0.10,0.12
ABC,Z,0.10,0.12
DEF,X,0.14,0.17
DEF,X,0.14,0.17
DEF,Y,0.15,0.17
DEF,Y,0.15,0.17
DEF,Y,0.15,0.17
DEF,Y,0.15,0.17
DEF,Y,0.15,0.17
DEF,Z,0.14,0.15
DEF,Z,0.14,0.15

But, I want the third column to display the minimum of the values for X, then min value for Y, and then Z from the values that are currently being displayed in the third column.

Joseph Wong
  • 83
  • 2
  • 7

1 Answers1

5

Although it's often a good idea to create classes that serve as appropriate representations for the domain: In this case, I disagree with the comments.

Reading a CSV and sorting it by (the String contents of) one or more columns is a very generic operation. And it is independent of the domain.

It is possible to implement a Comparator that simply picks the strings at several indices of a List<String>, and compares the values at these indices lexicographically. With this Comparator, one can sort any List<List<String>> that was read from a CSV file.

Here is a simple example. It can be used to sort any CSV file based on the string contents of arbitrary columns.

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;

public class MultiColumnCsvSort
{
    private static final String COLUMN_SEPARATOR = ",";

    public static void main(String[] args) throws Exception
    {
        InputStream inputStream = new FileInputStream("sample-input.csv");
        List<List<String>> lines = readCsv(inputStream);

        // Create a comparator that sorts primarily by column 0,
        // and if these values are equal, by column 2
        Comparator<List<String>> comparator = createComparator(0, 2);
        Collections.sort(lines, comparator);

        OutputStream outputStream = new FileOutputStream("output.csv");
        String header = "Symbol, Exchange, Minimum, Average, Maximum, Total";
        writeCsv(header, lines, outputStream);        
    }

    private static List<List<String>> readCsv(
        InputStream inputStream) throws IOException
    {
        BufferedReader reader = new BufferedReader(
            new InputStreamReader(inputStream));
        List<List<String>> lines = new ArrayList<List<String>>();

        // Skip header
        String line = reader.readLine();

        while (true)
        {
            line = reader.readLine();
            if (line == null)
            {
                break;
            }
            List<String> list = Arrays.asList(line.split(COLUMN_SEPARATOR));
            lines.add(list);
        }
        return lines;
    }

    private static void writeCsv(
        String header, List<List<String>> lines, OutputStream outputStream) 
        throws IOException
    {
        Writer writer = new OutputStreamWriter(outputStream);
        writer.write(header+"\n");
        for (List<String> list : lines)
        {
            for (int i = 0; i < list.size(); i++)
            {
                writer.write(list.get(i));
                if (i < list.size() - 1)
                {
                    writer.write(COLUMN_SEPARATOR);
                }
            }
            writer.write("\n");
        }
        writer.close();

    }

    private static <T extends Comparable<? super T>> Comparator<List<T>> 
        createComparator(int... indices)
    {
        return createComparator(MultiColumnCsvSort.<T>naturalOrder(), indices);
    }

    private static <T extends Comparable<? super T>> Comparator<T>
        naturalOrder()
    {
        return new Comparator<T>()
        {
            @Override
            public int compare(T t0, T t1)
            {
                return t0.compareTo(t1);
            }
        };
    }

    private static <T> Comparator<List<T>> createComparator(
        final Comparator<? super T> delegate, final int... indices)
    {
        return new Comparator<List<T>>()
        {
            @Override
            public int compare(List<T> list0, List<T> list1)
            {
                for (int i = 0; i < indices.length; i++)
                {
                    T element0 = list0.get(indices[i]);
                    T element1 = list1.get(indices[i]);
                    int n = delegate.compare(element0, element1);
                    if (n != 0)
                    {
                        return n;
                    }
                }
                return 0;
            }
        };
    }
}

Updated a few years later:

If you want more flexibility regarding the sort order of individual columns, there are different options. Which one is the "best" one largely depends on how you want to "assemble" the actual comparator - that is, how you want to define which column should be sorted in which order. But one simple example is shown here:

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;

public class MultiColumnCsvSortExtended
{
    private static final String COLUMN_SEPARATOR = ",";

    public static void main(String[] args) throws Exception
    {
        InputStream inputStream = new FileInputStream("sample-input.csv");
        List<List<String>> lines = readCsv(inputStream);

        // Create a comparator that compares the elements from column 0,
        // in ascending order
        Comparator<List<String>> c0 = createAscendingComparator(0);

        // Create a comparator that compares the elements from column 2,
        // in descending order
        Comparator<List<String>> c1 = createDesendingComparator(2);

        // Create a comparator that compares primarily by using c0,
        // and secondarily by using c1
        Comparator<List<String>> comparator = createComparator(c0, c1);
        Collections.sort(lines, comparator);

        OutputStream outputStream = new FileOutputStream("output.csv");
        String header = "Symbol, Exchange, Minimum, Average";
        writeCsv(header, lines, outputStream);        
    }

    private static List<List<String>> readCsv(
        InputStream inputStream) throws IOException
    {
        BufferedReader reader = new BufferedReader(
            new InputStreamReader(inputStream));
        List<List<String>> lines = new ArrayList<List<String>>();

        String line = null;

        // Skip header
        line = reader.readLine();

        while (true)
        {
            line = reader.readLine();
            if (line == null)
            {
                break;
            }
            List<String> list = Arrays.asList(line.split(COLUMN_SEPARATOR));
            lines.add(list);
        }
        return lines;
    }

    private static void writeCsv(
        String header, List<List<String>> lines, OutputStream outputStream) 
        throws IOException
    {
        Writer writer = new OutputStreamWriter(outputStream);
        writer.write(header+"\n");
        for (List<String> list : lines)
        {
            for (int i = 0; i < list.size(); i++)
            {
                writer.write(list.get(i));
                if (i < list.size() - 1)
                {
                    writer.write(COLUMN_SEPARATOR);
                }
            }
            writer.write("\n");
        }
        writer.close();

    }

    @SafeVarargs
    private static <T> Comparator<T>
        createComparator(Comparator<? super T>... delegates)
    {
        return (t0, t1) -> 
        {
            for (Comparator<? super T> delegate : delegates)
            {
                int n = delegate.compare(t0, t1);
                if (n != 0)
                {
                    return n;
                }
            }
            return 0;
        };
    }

    private static <T extends Comparable<? super T>> Comparator<List<T>>
        createAscendingComparator(int index)
    {
        return createListAtIndexComparator(Comparator.naturalOrder(), index);
    }

    private static <T extends Comparable<? super T>> Comparator<List<T>>
        createDesendingComparator(int index)
    {
        return createListAtIndexComparator(Comparator.reverseOrder(), index);
    }

    private static <T> Comparator<List<T>>
        createListAtIndexComparator(Comparator<? super T> delegate, int index)
    {
        return (list0, list1) -> 
            delegate.compare(list0.get(index), list1.get(index));
    }

}
Marco13
  • 53,703
  • 9
  • 80
  • 159
  • Anyway to read the entire file but write only the first two sorted columns in the output file, and write values individually in the cells of the third column? – Joseph Wong Oct 19 '14 at 09:28
  • I'm, not sure what you mean by "* write values individually in the cells of the third column*", but: It's easy to modify the `writeCsv` method to receive a list/array of `int` values that indicate which columns should be written, similar to the usage of the `final int... indices` in the `createComparator` method. I could add such a method, but the class name originally was `Practice`, which indicates that it might be a good idea to try this on your own, and possibly come back with more specific questions ;-) – Marco13 Oct 19 '14 at 09:38
  • Hi Marco13, I edited my question, to be more specific, to make you understand what exactly I'm asking. :) – Joseph Wong Oct 19 '14 at 10:49
  • If I understood you correctly, you just have to add *further* sorting criteria. You can change the call in my example to `createComparator(0, 1, 2, 3);` to sort based on **all** columns, lexicographically. – Marco13 Oct 19 '14 at 11:23
  • No no @Marco13 ! You didn't understand. :) I edited the question to add the output that I actually want. The third column is displaying only the minimum value from column 2 of input table (the first table in the question) when 1st column is ABC and 2nd column is X, then when it is ABC and Y, then only the min value of that from 2nd column of input table and so on. Did you understand now? :( – Joseph Wong Oct 19 '14 at 22:17
  • Hi @Marco13 could you help me at this? – Joseph Wong Oct 20 '14 at 03:17
  • That's a bit more code, and fairly unrelated to the original question. You might consider asking this as a *new* question, because it is not related to sorting any more. (If there are more such "post-processing steps", then it might also be more convenient to create a dedicated class for the table rows, although it's not strictly necessary) – Marco13 Oct 20 '14 at 10:35
  • @arvin_codeHunk There are many degrees of freedom for the implementation of this. I added one example as an update to the answer. Other options are possible. But in general, you'd probably create some sort of comparators for the desired columns, and then combine these comparators into one that simply does a lexicographical comparison using other comparators. – Marco13 Jul 22 '17 at 00:36
  • @arvin_codeHunk Sorry, stackoverflow is not a "support forum". Translating the given code from Java 8 to Java 7 should be easy. Beyond that, if you have a specific question, you could try to ask it separately (but prepare for being downvoted: The question "why isn't my code working" is *not* an appropriate question for stackoverflow) – Marco13 Jul 22 '17 at 13:59