-1

I want to create a dataframe with summary features for all columns in the dataframe using summarize_all() function and the dataframe has NA values.

Input->  
    a b   
    0 1
Output->  
a.min a.max a.mean a.sd b.min b.max b.mean b.sd
0 0 0 0 1 1 1 1

Code:  
df<- df%>%   
  summarize_all(funs( min , max ,mean, sd))

How to handle null values in this code?

mnm
  • 1,962
  • 4
  • 19
  • 46

1 Answers1

0

In the dataframe posted in question, there are NA values because there is not enough data to compute standard deviation. See below:

> df1<-data.frame("a"=0,"b"=1)
> df1%>%   
+   summarize_all(funs( min , max ,mean, sd))
  a_min b_min a_max b_max a_mean b_mean a_sd b_sd
1     0     1     0     1      0      1  NaN  NaN

> df2<-data.frame("a"=c(0,1,2,3),"b"=c(1,3,5,7))
> df2%>%   
+   summarize_all(funs( min , max ,mean, sd))
  a_min b_min a_max b_max a_mean b_mean     a_sd     b_sd
1     0     1     3     7    1.5      4 1.290994 2.581989

In case you have NA values in the dataset, using na.rm=T will solve your purpose:

> df3<-data.frame("a"=c(0,1,NA,3),"b"=c(1,3,5,7))

# with na.rm=T
> df3%>%   
+   summarize_all(funs( min , max ,mean, sd),na.rm=T)
  a_min b_min a_max b_max   a_mean b_mean     a_sd     b_sd
1     0     1     3     7 1.333333      4 1.527525 2.581989

# without na.rm=T
> df3%>%   
+   summarize_all(funs( min , max ,mean, sd))
  a_min b_min a_max b_max a_mean b_mean a_sd     b_sd
1    NA     1    NA     7     NA      4  NaN 2.581989
rar
  • 894
  • 1
  • 9
  • 24