-1

I created a function that took in a data frame and returned the mean and median for numeric variables in that data frame. When I test the function, there are 3 null values. How would I remove null values from this?

    df.numeric.summary <- function(data.frame1){

variable.list=list()
numcols <- sapply(data,is.numeric)
for(i in 1:ncol(data.frame1)){
  if (is.numeric(data.frame1[[i]]) == TRUE) {

    variable.list[[i]]=list(c("Mean"=mean(data.frame1[[i]], na.rm = TRUE),"Median"=median(data.frame1[[i]]), "IQR"=IQR(data.frame1[[i]])))
  }
}
return(variable.list)
}

My output looks like this:

[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
[[4]][[1]]
    Mean   Median      IQR 
10.76687  3.56400  7.75100 


[[5]]
[[5]][[1]]
    Mean   Median      IQR 
10.43467  1.40000  4.50100 


[[6]]
[[6]][[1]]
    Mean   Median      IQR 
3.701434 0.839000 2.429500 

whereas the output should look like this

$Pb1
    Mean   Median      IQR 
10.76687  3.56400  7.75100 

$Pb2
    Mean   Median      IQR 
10.43467  1.40000  4.50100 

$Pb3
    Mean   Median      IQR 
3.701434 0.839000 2.429500 
Hannah
  • 67
  • 8
  • 1
    Do you mean NULL or NA? They are different in R. Please provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data. – MrFlick Oct 16 '17 at 13:36
  • I added my code. – Hannah Oct 16 '17 at 13:40
  • And what exactly is the desired output. You are assigning to `variable.list[[i]]` but then `i=1` the column doesn't appear to be numeric. So the first time you assign is when `i=4` which leaves those NULL values. – MrFlick Oct 16 '17 at 13:42
  • I added an image of the desired output. I apologize for it not being there before. – Hannah Oct 16 '17 at 13:47

3 Answers3

0

You have to use na.rm=TRUE

x <- c(1,2,5,7,NA,3)
mean(x) # returns NA
# [1] NA
mean(x, na.rm=TRUE) # returns 3.6
# [1] 3.6

similarly for median calculation.

Prradep
  • 5,506
  • 5
  • 43
  • 84
0

Passing na.rm = T in your lapply or sapply as an option while calculating mean or median should help

> iris1 <- iris
> 
> #imputing NA
> 
> iris1[2,3] <- NA
> 
> iris1[3,2] <- NA
> 
> #without na.rm
> 
> lapply(iris1[1:4], mean)
$Sepal.Length
[1] 5.843333

$Sepal.Width
[1] NA

$Petal.Length
[1] NA

$Petal.Width
[1] 1.199333

> 
> 
> lapply(iris1[1:4], mean, na.rm = T)
$Sepal.Length
[1] 5.843333

$Sepal.Width
[1] 3.056376

$Petal.Length
[1] 3.773826

$Petal.Width
[1] 1.199333

> 
> lapply(iris1[1:4], median, na.rm = T)
$Sepal.Length
[1] 5.8

$Sepal.Width
[1] 3

$Petal.Length
[1] 4.4

$Petal.Width
[1] 1.3
amrrs
  • 6,215
  • 2
  • 18
  • 27
0

This would be much easier if you just used Filter() and Map() For example

df.numeric.summary <- function(data.frame1){
    my_summary <- function(x) c(
      "Mean"=mean(x, na.rm = TRUE),
      "Median"=median(x, na.rm=TRUE),
      "IQR"=IQR(x, na.rm=TRUE))

    Map(my_summary, Filter(is.numeric, data.frame1))
}

You can test with

df.numeric.summary(iris)
MrFlick
  • 195,160
  • 17
  • 277
  • 295