1

I'm having an issue with a CSV dataset in HDFS when performing MapReduce with rmr2.

With 1 file only the MapReduce works fine and no error is found, but when 2 or more datasets in the same folder the data starts to break and the results in starts to break down as can be seen below:

Error Screenshot

from line 16 onwards the error starts and goes until the end of file.

the MapReduce used is:

calc = mapreduce(
 input="hdfs://127.0.0.1:8020/user/cloudera/flumeFinal",
  input.format=make.input.format(format="csv", sep = ",",
  col.names=col.names,stringsAsFactors=F),
    map=function(k,lines){
     k <- lines[2]
     return(keyval(k,1))
     },
    reduce= function(k,lines) {
     keyval(k,sum(lines))

Does anyone have ever faced a similar issue and can help with this?

Thanks, Bruno

Pengyy
  • 37,383
  • 15
  • 83
  • 73

0 Answers0