Given a class called KeyLabelDistance which I am passing as the key and value in Hadoop,I want to perform secondary sort on it, i.e I first want to sort the keys based on the increasing value of key and then in the DECREASING order of the distances.
In order to to do this I need to write my own GroupingComparator.My question is since the setGroupingComparator() method takes as a parameter only a class which extends RawComparator, how do I perform this comparison in the grouping comparator in terms of bytes? Do I need to explicitly serialize and deserialize the objects? And also does having the class KeyLabelDistance implement WritableComparable as follows make the need for a SortComparator as redundant?
I got the use of SortComparator and GroupComparator from this answer : What are the differences between Sort Comparator and Group Comparator in Hadoop?
Following is the implementation of KeyLabelDistance:
public class KeyLabelDistance implements WritableComparable<KeyLabelDistance>
{
private int key;
private int label;
private double distance;
KeyLabelDistance()
{
key = 0;
label = 0;
distance = 0;
}
KeyLabelDistance(int key, int label, double distance)
{
this.key = key;
this.label = label;
this.distance = distance;
}
public int getKey() {
return key;
}
public void setKey(int key) {
this.key = key;
}
public int getLabel() {
return label;
}
public void setLabel(int label) {
this.label = label;
}
public double getDistance() {
return distance;
}
public void setDistance(double distance) {
this.distance = distance;
}
public int compareTo(KeyLabelDistance lhs, KeyLabelDistance rhs)
{
if(lhs == rhs)
return 0;
else
{
if(lhs.getKey() < rhs.getKey())
return -1;
else if(lhs.getKey() > rhs.getKey())
return 1;
else
{
//If the keys are equal, look at the distances -> since more is the "distance" more is the "similarity", the comparison is counterintuitive
if(lhs.getDistance() < rhs.getDistance() )
return 1;
else if(lhs.getDistance() > rhs.getDistance())
return -1;
else return 0;
}
}
}
}
The code for the group comparator is as follows:
public class KeyLabelDistanceGroupingComparator extends WritableComparator{
public int compare (KeyLabelDistance lhs, KeyLabelDistance rhs)
{
if(lhs == rhs)
return 0;
else
{
if(lhs.getKey() < rhs.getKey())
return -1;
else if(lhs.getKey() > rhs.getKey())
return 1;
return 0;
}
}
}
Any help is appreciated.Thanks in advance.