0

I try to sort a tab-delimited file based on the value of three columns. But it did not sort correctly.How can I solve this problem? I used the code written in this page.

The output is this:

    clueweb09-en0000-12-00000   10722   10732   0.995358    0.000336    /m/0cbx95
    clueweb09-en0000-12-00000   10736   10746   0.950789    0.000697    /m/01n7q
    clueweb09-en0000-12-00000   11230   11237   0.829546    0.000291    /m/03jm5
    clueweb09-en0000-12-00000   13009   13024   0.540326    0.000085    /m/012qgt
    clueweb09-en0000-12-00000   13050   13060   0.950789    0.000697    /m/01n7q
    clueweb09-en0000-12-00000   1338    1348    0.950789    0.000697    /m/01n7q
    clueweb09-en0000-12-00000   1864    1874    0.950789    0.000697    /m/01n7q
    clueweb09-en0000-12-00000   2018    2028    0.950789    0.000697    /m/01n7q
    clueweb09-en0000-12-00000   2745    2752    0.78671     0.000722    /m/02jx1
    clueweb09-en0000-12-00000   2823    2829    0.956747    0.000476    /m/04jpl
    clueweb09-en0000-12-00000   2856    2862    0.649632    0.000007    /m/0gs0g

I want this output:

    clueweb09-en0000-12-00000   1338    1348    0.950789    0.000697    /m/01n7q
    clueweb09-en0000-12-00000   1864    1874    0.950789    0.000697    /m/01n7q
    clueweb09-en0000-12-00000   2018    2028    0.950789    0.000697    /m/01n7q
    clueweb09-en0000-12-00000   2745    2752    0.78671     0.000722    /m/02jx1
    clueweb09-en0000-12-00000   2823    2829    0.956747    0.000476    /m/04jpl
    clueweb09-en0000-12-00000   2856    2862    0.649632    0.000007    /m/0gs0g
    clueweb09-en0000-12-00000   10722   10732   0.995358    0.000336    /m/0cbx95
    clueweb09-en0000-12-00000   10736   10746   0.950789    0.000697    /m/01n7q
    clueweb09-en0000-12-00000   11230   11237   0.829546    0.000291    /m/03jm5
    clueweb09-en0000-12-00000   13009   13024   0.540326    0.000085    /m/012qgt
    clueweb09-en0000-12-00000   13050   13060   0.950789    0.000697    /m/01n7q
Community
  • 1
  • 1
user3092781
  • 313
  • 2
  • 16
  • The answer to your linked question clearly states that it is comparing columns as strings. You need to follow the advice in the question’s comments, and create a data class that represents each row. – VGR Apr 13 '17 at 14:00

3 Answers3

2

write a your own comparator for the columns you want to sort.As Walter said currently you will get the data as string,Convert that to list of object and sort using comparator. Hope this helps.

Porkko M
  • 307
  • 2
  • 10
  • I know the problem is comparing string instead of integer. But I do not know ho write comparator to handell integer and string. I mean How Can I have two comparators. Could you write some code. – user3092781 Apr 13 '17 at 13:59
1

The numbers are treated at this moment as strings but they are left aligned and sorted that way. You need to change the compare function to deal with this or, when you read the file transform those numbers to right aligned strings.

Walter Palladino
  • 479
  • 1
  • 3
  • 8
  • Could you explain more how should I change the compare function. You mean I need to have two compare function one for string and one for integer? – user3092781 Apr 13 '17 at 13:51
  • 1
    Check it out here..i believe it helps..http://stackoverflow.com/questions/369512/how-to-compare-objects-by-multiple-fields – Porkko M Apr 13 '17 at 14:01
  • Following the code you refer, when the CSV file is read, every element is read as an string and the comparator function assumes you have an array of comparable objects. The proper way will be not load the data into a String list but create an object and parse properly the data you get from the file. And then create a comparator like the Porkko's example. – Walter Palladino Apr 13 '17 at 14:08
0

I solve the problem by changing the compare function of this page to the following code.

private static <T> Comparator<List<T>> createComparator(
        final Comparator<? super T> delegate, final int... indices)
    {
        return new Comparator<List<T>>()
        {
            @Override
            public int compare(List<T> list0, List<T> list1)
            {

                    T element0 = list0.get(indices[0]);
                    T element1 = list1.get(indices[0]);
                    int n = delegate.compare(element0, element1);
                    if (n != 0)
                    {
                        return n;
                    }
                    else
                        return Integer.compare(Integer.parseInt(list0.get(indices[1]).toString()), Integer.parseInt(list1.get(indices[1]).toString()));
            }
        };
    }
Community
  • 1
  • 1
user3092781
  • 313
  • 2
  • 16