0

I'm trying to find correlation coefficient of data frame and work perfectly.

Is there a problem finding correlation coefficient from data frame using cor(), or is it good to implement this code for large data?

cc = function(input, output = NULL){
    cc.map =  function(., v) 
    {
        data <- v[-1,]  
        data[,1:length(data)] = lapply(data[,1:length(data)], as.numeric)
        keyval("korelasi",data)

    }
    cc.reduce =function(k, v ) 
    {
        keyval(k, cor(v))
    }
    mapreduce(
        input = input ,
        output = output,
        input.format = make.input.format("csv",sep=",",fill = TRUE,stringsAsFactors=FALSE),
        map = cc.map,
        reduce = cc.reduce,
        combine = T)}

1 Answers1

0

cor is the standard way to compute correlation in a matrix in R. You are already doing this within a reduce function, so hadoop will handle this for large datasets.

Henrique Andrade
  • 855
  • 1
  • 12
  • 25