0

Given a class called KeyLabelDistance which I am passing as the key and value in Hadoop,I want to perform secondary sort on it, i.e I first want to sort the keys based on the increasing value of key and then in the DECREASING order of the distances.

In order to to do this I need to write my own GroupingComparator.My question is since the setGroupingComparator() method takes as a parameter only a class which extends RawComparator, how do I perform this comparison in the grouping comparator in terms of bytes? Do I need to explicitly serialize and deserialize the objects? And also does having the class KeyLabelDistance implement WritableComparable as follows make the need for a SortComparator as redundant?

I got the use of SortComparator and GroupComparator from this answer : What are the differences between Sort Comparator and Group Comparator in Hadoop?

Following is the implementation of KeyLabelDistance:

public class KeyLabelDistance implements WritableComparable<KeyLabelDistance>
    {
        private int key;
        private int label;
        private double distance;
        KeyLabelDistance()
        {
            key = 0;
            label = 0;
            distance = 0;
        }
        KeyLabelDistance(int key, int label, double distance)
        {
            this.key = key;
            this.label = label;
            this.distance = distance;
        }
        public int getKey() {
            return key;
        }
        public void setKey(int key) {
            this.key = key;
        }
        public int getLabel() {
            return label;
        }
        public void setLabel(int label) {
            this.label = label;
        }
        public double getDistance() {
            return distance;
        }
        public void setDistance(double distance) {
            this.distance = distance;
        }

        public int compareTo(KeyLabelDistance lhs, KeyLabelDistance rhs)
        {
            if(lhs == rhs)
                return 0;
            else
            {
                if(lhs.getKey() < rhs.getKey())
                    return -1;
                else if(lhs.getKey() > rhs.getKey())
                    return 1;
                else
                {
                    //If the keys are equal, look at the distances -> since more is the "distance" more is the "similarity", the comparison is counterintuitive
                    if(lhs.getDistance() < rhs.getDistance() )
                        return 1;
                    else if(lhs.getDistance() > rhs.getDistance())
                        return -1;
                    else return 0;
                }
            }
        }
    }

The code for the group comparator is as follows:

public class KeyLabelDistanceGroupingComparator extends WritableComparator{
    public int compare (KeyLabelDistance lhs, KeyLabelDistance rhs)
    {
        if(lhs == rhs)
            return 0;
        else
        {
            if(lhs.getKey() < rhs.getKey())
                return -1;
            else if(lhs.getKey() > rhs.getKey())
                return 1;
            return 0;
        }
    }
}

Any help is appreciated.Thanks in advance.

Community
  • 1
  • 1
user3377770
  • 91
  • 1
  • 1
  • 4

1 Answers1

0

You can extend WritableComparator which in turn implements RawComparator. Both your sorting & grouping comparator will extend WritableComparator.

If you do not provide these comparators hadoop will internally end up using compareTo of the writable which is your key.

Venkat
  • 1,810
  • 1
  • 11
  • 14
  • Thank you, I tried that. Now I have included the code for group comparator also in my question but I get the following error: KeyLabelDistanceGroupingComparator.java:3: cannot find symbol symbol : constructor WritableComparator() location: class org.apache.hadoop.io.WritableComparator – user3377770 Apr 03 '14 at 05:31
  • When tou extend a class in java and the super class does not have a default constructor this is the error you get. Create constructor in your code and invoke super(). e.g.: XYZKeyValueComparator() { super(MyWritable.class, true); } – Venkat Apr 03 '14 at 12:26