0

I have demographic survey data in which responses for variable X were recorded into 3 separate columns. col1 contains only 1, col2 only 2 and col3 contains everything else. There is no overlap of responses so when the respondent answered "1" cells for col2 and col3 in that same row are empty. I'd like to combine all 3 columns (there are 3000+ responses so doing this manually is out of the question). I should note that there are 100+ columns in this data frame and I will need to use this on other variables as well.

I attempted gather() and apply() commands without much luck.

(see this question for reference: r collapsing data from multiple columns into one)

The apply command combined EVERY column, so I'm guessing I just need to fix my parameters? Wasn't sure how to do this (a bit of a newbie here).

df$New <- apply(df, 1, function(x) paste(x, collapse = ","))

Thanks for any help/advice!

astrofunkswag
  • 2,608
  • 12
  • 25
emsil
  • 1
  • Welcome to SO, please share a portion of your data with `dput(head(df))`. Check out [this link](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to ask a good R question, it's pretty hard to tell from your post what it is that you're asking for. – astrofunkswag May 29 '19 at 20:14
  • `apply(df[,1:3],...` will do it using just columns 1 to 3 – Andrew Gustar May 29 '19 at 20:15
  • @astrofunkswag thanks! will do for next time – emsil May 30 '19 at 13:42
  • @AndrewGustar Thank you! – emsil May 30 '19 at 13:43

1 Answers1

0

Can you use gather() %>% summarise(paste(..., collapse = ", "))? I am using the example you linked to above.

library(tidyverse)

df <- 
  tibble(
    id = 1:4,
    variable_1 = c('Var1', NA, NA,'Var1'),
    variable_2 = c('Var2', 'No', NA, NA),
    variable_3 = c(NA, NA, 'Var3', NA),
    variable_4 = c(NA, 'Var4', NA, NA),
    variable_5 = c(NA, 'No', 'Var5', NA),
    variable_6 = c(NA, NA, 'Var6', NA) 
  )


df %>% 
  # pivot everything besides id
  gather(var, response, -id) %>%
  # optional
  filter(!is.na(response)) %>% 
  # the method
  group_by(id) %>% 
  summarise(sentence = paste(response, collapse = ", ")) %>% 
  ungroup()

#    id sentence        
#     1 Var1, Var2      
#     2 No, Var4, No    
#     3 Var3, Var5, Var6
#     4 Var1   
yake84
  • 3,004
  • 2
  • 19
  • 35