0

How can I collapse data in a wide format (see example below), into a concatenated column showing only the TRUE values? I want to end up with a data table in the format Employee Name | "string of applicable column headers" as illustrated in demoOUT.

library(data.table)
demoIN <- data.table(
  Name=c("Mike Jones","Bobby Fisher"),
  A=c(1,0),
  B=c(1,1),
  C=c(0,0),
  D=c(1,1))

           Name A B C D
1:   Mike Jones 1 1 0 1
2: Bobby Fisher 0 1 0 1

demoOUT <- data.table(
  Name=c("Mike Jones","Bobby Fisher"),
  Cases =c("A,B,D","B,D"))

           Name Cases
1:   Mike Jones A,B,D
2: Bobby Fisher   B,D
Mako212
  • 6,787
  • 1
  • 18
  • 37

3 Answers3

2

A solution uses functions from dplyr and tidyr. demoIN2 is the final output.

library(dplyr)
library(tidyr)

demoIN2 <- demoIN %>%
  gather(Cases, Value, -Name) %>%
  filter(Value == 1) %>%
  group_by(Name) %>%
  summarise(Cases = paste(Cases, collapse = ","))
www
  • 38,575
  • 12
  • 48
  • 84
2

Here is a base R solution if you were interested.

demoIN$Cases <- apply(demoIN[, -c("Name")], 1, function(x) paste(na.omit(ifelse(x == 1, names(x), NA)), collapse = ","))

demoIN <- demoIN[,c("Name","Cases")]
Matt Jewett
  • 3,249
  • 1
  • 14
  • 21
  • Nice. Interestly much slower than `dplyr/tidyr`, just tried with 1.2M rows and took 43.68 seconds, compared to 2.61 seconds with `dplyr/tidyr` – Mako212 Jul 14 '17 at 18:39
1

Here is an option using data.table (as the initial object is data.table

library(data.table)
melt(demoIN, id.var = 'Name')[value==1, .(Cases = paste(variable, collapse=',')), Name]
#           Name Cases
#1:   Mike Jones A,B,D
#2: Bobby Fisher   B,D
akrun
  • 874,273
  • 37
  • 540
  • 662