2

Iam trying to learn hadoop (mapreduce). I have a mapper method in which I use the Date class to parse the ;epoch_time; field expressed in miliseconds from a dataset. The dataset consists of epoch between 25.05.2015 to 10.08.2015.

I would like to convert the epoch to date/time but only return the date/time from the epoch between 05.06.2015 to 15.06.2015.

Here is what I have achieved so far. The code below produces the following:

output:

25.05.2015

25.06.2015

etc

desired output

05.06.2015 5//count of word occurrence on this date

06.06.2015 53

07.06.2015 41

etc

Mapper

   public class mapper extends Mapper<Object, Text, Text, IntWritable> { 
    private Text data = new Text();
     private IntWritable one = new IntWritable(1);
   String time;

      public void map(Object key, Text value, Context context) throws IOException,      InterruptedException {

String[] userinput = value.toString().split(";");
try{    


        LocalDateTime epoch = LocalDateTime.ofEpochSecond(Long.parseLong(userinput[0])/1000, 0, ZoneOffset.UTC);
        DateTimeFormatter f = DateTimeFormatter.ofPattern("dd.MM.yyyy");
        time = epoch.format(f);




    data.set(time);
    context.write(data,one);
}
catch(Exception e){
    System.out.println("Error: " + e);
}

    }
}

Reducer

     public class reducer extends Reducer<Text, IntWritable, Text, IntWritable> {

private IntWritable one = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values, Context context)

    throws IOException, InterruptedException {

    int sum = 0;

    for (IntWritable value : values) {

        sum+=value.get();

    }

    one.set(sum);
    context.write(key, one);

}

}

user2023
  • 47
  • 5
  • Hi, the code is just a snippet from my mapper class. I have a mapper, a reducer and a driver class. Could you please advice me which date pattern to use? Many Thanks – user2023 Nov 02 '17 at 07:01
  • And my point about not being about Hadoop is still accurate. Write a unit test or regular Java program for identifying date ranges, then put in the conditions into the mapper that can extract the subset of date ranges you care about – OneCricketeer Nov 02 '17 at 13:41
  • Hi, you are right I have periods for the date format I have updated the my code please see the updated code snippet. How do I access the date range? Could you please provide me with an example of how to get the date range? – user2023 Nov 02 '17 at 16:33
  • Parse your input to millisecond epoch then it is as simple as `if (startDate <= yourData && yourData <= endDate)` ... Then use `context.write()` else, don't – OneCricketeer Nov 02 '17 at 18:39

1 Answers1

0

So you only care about this bracketted data... 25.05.2015 [05.06.2015 ... 15.06.2015] 10.08.2015

If that's all you need, it is as simple as an if statement.

I'm not that familiar with Java 8, but check this Java: how do I check if a Date is within a certain range?

public class mapper extends Mapper<Object, Text, Text, IntWritable> { 
   private Text data = new Text();
   private static final IntWritable ONE = new IntWritable(1);
   private static final DateTimeFormatter FMT = DateTimeFormatter.ofPattern("dd.MM.yyyy");
   String time;

   // Define the boundaries
   private LocalDateTime start = LocalDateTime.parse("2015.06.05", FMT);
   private LocalDateTime end = LocalDateTime.parse("2015.06.15", FMT);

   @Override
   public void map(Object key, Text value, Context context) throws IOException,      InterruptedException {

       String[] userinput = value.toString().split(";");
       try {
           Long ms = Long.parseLong(userinput[0])/1000;    
           LocalDateTime inputEpoch = LocalDateTime.ofEpochSecond(ms, 0, ZoneOffset.UTC);

           // Filter your data
           if (inputEpoch.isAfter(start) && inputEpoch.isBefore(end)) {
               data.set(inputEpoch.format(FMT));
               context.write(data,ONE);
           }
       } catch (...) { }
   }
}
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • I amended my class and applied your suggestions but I get 2 errors. method ChronoLocalDateTime.isAfter(ChronoLocalDateTime>) is not applicable, if (inputEpoch.isAfter(start) && inputEpoch.isBefore(end)) { [javac] ^ [javac] method ChronoLocalDateTime.isBefore(ChronoLocalDateTime>) is not applicable.(argument mismatch; LocalDate cannot be converted to ChronoLocalDateTime>) – user2023 Nov 02 '17 at 21:08
  • Need `LocalDateTime`. Try again. – OneCricketeer Nov 03 '17 at 02:34