2

I have the following data set:

sample.data <- data.frame(Step = c(1,2,3,4,1,2,1,2,3,1,1),
                          Case = c(1,1,1,1,2,2,3,3,3,4,5),
                          Decision = c("Referred","Referred","Referred","Approved","Referred","Declined","Referred","Referred","Declined","Approved","Declined"),
                          Reason = c("Docs","Slip","Docs","","Docs","","Slip","Docs","","",""))

sample.data

      Step Case Decision Reason
1     1    1    Referred Docs
2     2    1    Referred Slip
3     3    1    Referred Docs
4     4    1    Approved
5     1    2    Referred Docs
6     2    2    Declined
7     1    3    Referred Slip
8     2    3    Referred Docs
9     3    3    Declined
10    1    4    Approved
11    1    5    Declined

Is it possible in R to translate this into a wide table format, with the decisions on the header, and the value of each cell being the count of the occurrence, for example:

Case    Referred    Approved    Declined    Docs     Slip
 1          3           1           0        2        0
 2          1           0           1        1        0
 3          2           0           1        1        1
 4          0           1           0        0        0
 5          0           0           1        0        0
Jaap
  • 81,064
  • 34
  • 182
  • 193
dGecko
  • 153
  • 2
  • 7

3 Answers3

3
library(reshape2)

df1 <- dcast(sample.data, Case~Decision+Reason)
names(df1)[2:5] <- c("Approved", "Declined", "Docs", "Slip")
df1$Referred <- df1$Docs + df1$Slip

df1
#    Case Approved Declined Docs Slip Referred
# 1:    1        1        0    2    1        3
# 2:    2        0        1    1    0        1
# 3:    3        0        1    1    1        2
# 4:    4        1        0    0    0        0
# 5:    5        0        1    0    0        0
Ven Yao
  • 3,680
  • 2
  • 27
  • 42
  • As I have seen with most long to wide format questions on here, this does not produce the result I am looking for. The `dcast` function gives the results for a combination of the 2 factors, where my output keeps them separate still. – dGecko Dec 22 '15 at 16:32
  • Yeah. You're right. Maybe you need to work with the result of `dcast`. Maybe the desired output you showed was not correct. Please check. – Ven Yao Dec 22 '15 at 16:42
  • Good use of `dcast`, Plus one! – akrun Dec 22 '15 at 16:50
2

We can use gather/spread from tidyr

 library(tidyr)
 library(dplyr)
 gather(sample.data, Var, Val, 3:4) %>%
           group_by(Case, Val) %>% 
           summarise(n=n()) %>%
           filter(Val!='') %>% 
           spread(Val, n, fill=0)

#   Case Approved Declined  Docs Referred  Slip
#   (dbl)    (dbl)    (dbl) (dbl)    (dbl) (dbl)
#1     1        1        0     2        3     1
#2     2        0        1     1        1     0
#3     3        0        1     1        2     1
#4     4        1        0     0        0     0
#5     5        0        1     0        0     0
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Using:

library(reshape2)
tmp <- melt(sample.data, id.var=c("Step", "Case"))
tmp <- tmp[tmp$value!="",]

dcast(tmp, Case ~ value, value.var="Case", length)

you get:

   Case Approved Declined Docs Referred Slip
1:    1        1        0    2        3    1
2:    2        0        1    1        1    0
3:    3        0        1    1        2    1
4:    4        1        0    0        0    0
5:    5        0        1    0        0    0

Using the data.table-package, you can use the same melt and dcast functionality as with reshape2, but you don't need a temporary dataframe:

library(data.table)
dcast(melt(setDT(sample.data), id.var=c("Step", "Case"))[value!=""],
      Case ~ value, value.var="Case", length)

which will give you the same result.

Jaap
  • 81,064
  • 34
  • 182
  • 193