Dynamically Select Columns and Summaries Data

Question

I have the following data frame

ID <- c(1,1,2,3,4,5,6)
Value1 <- c(20,50,30,10,15,10,NA)
Value2 <- c(40,33,84,NA,20,1,NA)
Value3 <- c(60,40,60,10,25,NA,NA)
Grade1 <- c(20,50,30,10,15,10,NA)
Grade2 <- c(40,33,84,NA,20,1,NA)

DF <- data.frame(ID,Value1,Value2,Value3,Grade1,Grade2)

  ID Value1 Value2 Value3 Grade1 Grade2
1  1     20     40     60     20     40
2  1     50     33     40     50     33
3  2     30     84     60     30     84
4  3     10     NA     10     10     NA
5  4     15     20     25     15     20
6  5     10      1     NA     10      1
7  6     NA     NA     NA     NA     NA

I would like to group by ID, select columns with names contain the string ("Value"), and get the mean of these columns with NA not included.

Here is an example of the desired output

ID     mean(Value)
1      41
2      58
3      10
....

In my attempt to solve this challenge, I wrote the following code

Library(tidyverse)    
DF %>% group_by (ID) %>% select(contains("Value")) %>% summarise(mean(.,na.rm = TRUE))

The code groups the data by IDs, select columns with column name containing ("Value"), and attempts to summarise the selected columns by using the mean function. When I run my code, I get the following output

> DF %>% group_by (ID) %>% select(contains("Value")) %>% summarise(mean(.))
Adding missing grouping variables: `ID`
# A tibble: 6 x 2
     ID `mean(.)`
  <dbl>     <dbl>
1     1        NA
2     2        NA
3     3        NA
4     4        NA
5     5        NA
6     6        NA

I would appreciate your help in this manner.

score 1 · Accepted Answer · 2019-09-15T09:07:02.550

1

You should try using pivot_longer to get your data from wide to long form Read latest tidyR update on pivot_longer & pivot_wider (https://tidyr.tidyverse.org/articles/pivot.html)

library(tidyverse)

ID <- c(1,2,3,4,5,6)
Value1 <- c(50,30,10,15,10,NA)
Value2 <- c(33,84,NA,20,1,NA)
Value3 <- c(40,60,10,25,NA,NA)

DF <- data.frame(ID,Value1,Value2,Value3)

DF %>% pivot_longer(-ID) %>% 
  group_by(ID) %>% summarise(mean=mean(value,na.rm=TRUE))

Output here

   ID  mean
  <dbl> <dbl>
1     1  41  
2     2  58  
3     3  10  
4     4  20  
5     5  5.5
6     6 NaN

edited Sep 15 '19 at 09:07

answered Sep 15 '19 at 08:28

Could please explain why are you using pivot functions? Is it possible to solve this issue by grouping? – DDS AMP Sep 15 '19 at 10:00
I have used grouping but after pivoting because your data is in wide format. To group the data, we need it in long form (Read this for Wide Vs Long data understanding https://discuss.analyticsvidhya.com/t/difference-between-wide-and-long-data-format/8110#targetText=For%20converting%20data%20to%20wide,in%20R%20use%20Reshape2%20package%20.&targetText=Wide%20data%20has%20a%20column,the%20values%20of%20those%20variables.) – Sep 15 '19 at 11:16

score 0 · Answer 2 · answered Sep 15 '19 at 08:34

0

Without using dplyr or any specific package, this would help :

DF$mean<- rowMeans(DF[,c(2:4)], na.rm = T)

answered Sep 15 '19 at 08:34

S.Gradit

164
1
9

Your solution assumes columns' index is static. I want to avoid this issue by searching using column name. – DDS AMP Sep 15 '19 at 10:01
If ID is always present you can use : DF$avg <- rowMeans(DF %>% select(-ID), na.rm = T) – S.Gradit Sep 15 '19 at 10:19

Dynamically Select Columns and Summaries Data

2 Answers2