-1

I am trying to make a function in R that calculates the mean of nitrate, sulfate and ID. My original dataframe have 4 columns (date,nitrate, sulfulfate,ID). So I designed the next code

prueba<-read.csv("C:/Users/User/Desktop/coursera/001.csv",header=T)

columnmean<-function(y, removeNA=TRUE){ #y will be a matrix
    whichnumeric<-sapply(y, is.numeric)#which columns are numeric
    onlynumeric<-y[ , whichnumeric] #selecting just the numeric columns
    nc<-ncol(onlynumeric) #lenght of onlynumeric
    means<-numeric(nc)#empty vector for the means
        for(i in 1:nc){
            means[i]<-mean(onlynumeric[,i], na.rm = TRUE) 
        }



}

columnmean(prueba)

When I run my data without using the function(), but I use row by row with my data it will give me the mean values. Nevertheless if I try to use the function so it will make all the steps by itself, it wont mark me error but it also won't compute any value, as in my environment the dataframe 'prueba' and the columnmean function

what am I doing wrong?

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294

2 Answers2

2

A reproducible example would be nice (although not absolutely necessary in this case).

You need a final line return(means) at the end of your function. (Some old-school R users maintain that means alone is OK - R automatically returns the value of the last expression evaluated within the function whether return() is specified or not - but I feel that using return() explicitly is better practice.)

colMeans(y[sapply(y, is.numeric)], na.rm=TRUE)

is a slightly more compact way to achieve your goal (although there's nothing wrong with being a little more verbose if it makes your code easier for you to read and understand).

Community
  • 1
  • 1
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
0

The result of an R function is the value of the last expression. Your last expression is:

for(i in 1:nc){
            means[i]<-mean(onlynumeric[,i], na.rm = TRUE) 
        }

It may seem strange that the value of that expression is NULL, but that's the way it is with for-loops in R. The means vector does get changed sequentially, which means that BenBolker's advice to use return(.) is correct (as his advice almost always is.) . For-loops in R are a notable exception to the functional programming paradigm. They provide a mechanism for looping (as do the various *apply functions) but the commands inside the loop exert their effects in the calling environment via side effects (unlike the apply functions).

IRTFM
  • 258,963
  • 21
  • 364
  • 487