2

Can some one help me in understanding why i am getting this odd behavior of custom data type i am referring this and my mapper code is

public class customDataMapper extends Mapper<LongWritable, Text,Text,customText > {

Text url = new Text();
Text date = new Text();
Text ip = new Text();
customText ctext = new customText();

public void map (LongWritable key , Text value , Context context) throws IOException , InterruptedException{

    String words[] = value.toString().split("|");
    url.set(words[1]);
    date.set(words[2]);
    ip.set(words[4]);
    ctext.set(date,ip);
    context.write(url, ctext);
}   
}

and customText data type code is

public class customText implements WritableComparable<customText>{

private Text url , ip;

public customText(){
    this.url=new Text();
    this.ip=new Text();

}

public customText(Text URL , Text IP){
    this.url=URL;
    this.ip=IP;


}


public void set (Text URL , Text IP){
    this.url=URL;
    this.ip=IP;

}


public void readFields(DataInput in) throws IOException{
    url.readFields(in);
    ip.readFields(in);

}

public void write(DataOutput out ) throws IOException{
    url.write(out);
    ip.write(out);

}


public int compareTo(customText o){
    if(url.compareTo(o.ip)==0){

        return (ip.compareTo(o.ip));

    }
    else return (url.compareTo(o.ip));
}


public boolean equals(Object o){


    if (o instanceof customText){
    customText other = (customText)o;   
    return (url.equals(other.ip)) && ip.equals(other.ip);
    }
    return false;
}

public int hashCode(){
    return url.hashCode();

 }

and I received my output as

hduser@pradeep-VirtualBox:~/builds$ hadoop fs -cat /user/hadoop/dir8_customData/output/part-m-00000 1 customData.customDataSample1.customText@51 1 customData.customDataSample1.customText@51 1 customData.customDataSample1.customText@51 1 customData.customDataSample1.customText@51 1 customData.customDataSample1.customText@51

and my input file is

127248|/rr.html|2014-03-10|12:32:08|42.416.153.181
12|/rr12.html|2014-03-11|12:00:00|42.416.153.182
127241|/rr3232.html|2014-03-12|13:32:00|42.416.153.183
1272|/rrw33232.html|2014-03-15|14:32:08|42.416.153.184
121|/rr21212.html|2015-12-10|16:32:08|42.416.153.185

Can someone help me in understanding why I received this output and secondly I am not sure how compareTo is working , i mean so say when new group is created in reducer. I am new to hadoop and java programming.

Thanks

Abdul Fatir
  • 6,159
  • 5
  • 31
  • 58
Anaadih.pradeep
  • 2,453
  • 4
  • 18
  • 25

2 Answers2

3

You are splitting on a | using split("|"). This should be split("\\|"). See this SO answer for why escaping a pipe is needed.

Your customText class needs to Override toString() so that it knows how to de-serialize the data contained within the object. For example:

@Override
public String toString() {
    return url + "," + ip;
}

You are also setting your Text objects incorrectly:

public void set (Text URL , Text IP){
    this.url=URL;
    this.ip=IP;
}

This should be:

public void set(Text URL , Text IP){
    this.url.set(URL);
    this.ip.set(IP);
}

If your custom Writable object is being used as a Value, it only needs to implement the Writable interface and not WritableComparable. The WritableComparable interface is only needed for keys where Hadoop needs to group and sort the keys.

Your compareTo() method doesn't make sense (you're comparing URL to IP):

public int compareTo(customText o){
    if(url.compareTo(o.ip)==0){
        return (ip.compareTo(o.ip));
    }
    else return (url.compareTo(o.ip));
}

Should look like:

@Override
public int compareTo(customText o) {

    int result = url.compareTo(o.url);
    if (result != 0) {
        return result;
    }
    return ip.compareTo(o.ip);
}

Your hashcode should look something like this:

@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((ip == null) ? 0 : ip.hashCode());
    result = prime * result + ((url == null) ? 0 : url.hashCode());
    return result;
}

At the moment it only uses url and ignores ip.

You are also passing in date to ctext.set(date,ip). The variable is called url inside the custom object.

Style wise, your variable names should be lowercase URL = url and classes should start with an uppercase customText = CustomText

Community
  • 1
  • 1
Binary Nerd
  • 13,872
  • 4
  • 42
  • 44
1

Since toString() method is avaiable in the class that you are inheriting from you have to @Override the toString()

It should have given out the error before running the programm, not a error but atleast a yellow notification stating that this should be overriden or am i confusing it with android studio?

Kuantew
  • 134
  • 6
  • Thanks for this information @Xenidia .I need to know that if i have to write a data type like Customtext (Text,Text)like i used then i have to use toString() method to write in context but if case a custom data type contain two IntWritable , even then i have to use toString for write? – Anaadih.pradeep Jun 08 '16 at 10:27
  • 1
    This is related to how the object is written to disk by your output format. If you're using the TextOutputFormat when it writes the CustomText to disk, it calls its `toString()` method. In your output you can see that this results in a pointer to the object, which is the default implementation of `toString()` in Java. – Binary Nerd Jun 08 '16 at 10:43
  • 1
    It doesn't matter what data type you are working on you have to use toString() whether the method is different (there is an exception for context or constructors) or the data base is different (exception is the case for what binary nerd proposes) – Kuantew Jun 08 '16 at 13:55