Iam trying to learn hadoop (mapreduce). I have a mapper method in which I use the Date class to parse the ;epoch_time; field expressed in miliseconds from a dataset. The dataset consists of epoch between 25.05.2015 to 10.08.2015.
I would like to convert the epoch to date/time but only return the date/time from the epoch between 05.06.2015 to 15.06.2015.
Here is what I have achieved so far. The code below produces the following:
output:
25.05.2015
25.06.2015
etc
desired output
05.06.2015 5//count of word occurrence on this date
06.06.2015 53
07.06.2015 41
etc
Mapper
public class mapper extends Mapper<Object, Text, Text, IntWritable> {
private Text data = new Text();
private IntWritable one = new IntWritable(1);
String time;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String[] userinput = value.toString().split(";");
try{
LocalDateTime epoch = LocalDateTime.ofEpochSecond(Long.parseLong(userinput[0])/1000, 0, ZoneOffset.UTC);
DateTimeFormatter f = DateTimeFormatter.ofPattern("dd.MM.yyyy");
time = epoch.format(f);
data.set(time);
context.write(data,one);
}
catch(Exception e){
System.out.println("Error: " + e);
}
}
}
Reducer
public class reducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable one = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum+=value.get();
}
one.set(sum);
context.write(key, one);
}
}