0

I have a data frame:

df <- data.frame(id = c("a","a","a","b","b","b"), survey = rep("1a",6), q1 = c(NA,NA,"y","n",NA,NA),q2 = c("y",NA,NA,NA,"y",NA), q3 = c(NA,"n",NA,NA,NA,"y"))

These are survey data which I need to collapse to one for each id and survey. I can get close by:

df %>% group_by(id, survey) %>% summarize_all(toString)
Source: local data frame [2 x 5]
Groups: id [?]

      id survey        q1        q2        q3
    <fctr> <fctr>     <chr>     <chr>     <chr>
1      a     1a NA, NA, y y, NA, NA NA, n, NA
2      b     1a n, NA, NA NA, y, NA NA, NA, y

What I really need is:

  id survey q1 q2 q3
1  a     1a  y  y  n
2  b     1a  n  y  y

The real data frame is fairly large (1.2 million records).

Fundamentally different from suppress NAs in paste(). Answer to my question not found there.

Community
  • 1
  • 1
MikeW
  • 1
  • 1
  • 2
  • 1
    Based on the example showed `df %>% group_by(id, survey) %>% summarise_all(na.omit)`, but if there are many elementts then `df %>% group_by(id, survey) %>% summarise_all(funs(toString(unique(.[!is.na(.)]))))` – akrun Mar 22 '17 at 15:25
  • Possible duplicate of [suppress NAs in paste()](http://stackoverflow.com/questions/13673894/suppress-nas-in-paste) – Sotos Mar 22 '17 at 15:27
  • Thanks, akrun. Worked like a charm. Also, learned about the . functionality with %>% from you post - huge bonus. – MikeW Mar 22 '17 at 16:43

2 Answers2

0

A solution using base commands:

for (i in 3:5) {df[,i] <- ifelse(df[,i] == "y", 1, 
                                 ifelse(df[,i] == "n", 0, df[,1]))}
df2 <- data.frame(aggregate(q1 ~ id + survey, df, FUN = sum), 
                  aggregate(q2 ~ id + survey, df, FUN = sum)[3], 
                  aggregate(q3 ~ id + survey, df, FUN = sum)[3])

Not sure how quick it'd be for a large df or compared to dplyr. You may also want to replace the data.frame() call with merge()s if you expect missing replies for some id+survey combinations.

Andrew Cheesman
  • 140
  • 1
  • 10
0

The simpliest solution is:

df %>% group_by(id, survey) %>% summarise_all(na.omit)
G. Becker
  • 46
  • 3