2

I have a very simple dataset with two variables.

data <- data.frame(
             ID = c("A","A","B","C","D","D"),
             Service = c("Shop","Online","Shop","Online","Online","Shop"))

I want to use the spread() function on Service, but rather than spread by ID, I want the resulting table to include a "Y" symbol to indicate that ID operates that specific service. For example:

  ID       Shop    Online
  A         Y         Y
  B         Y         - 
  C         -         Y
  D         Y         Y

However, the spread() function won't work by just giving a key, so is there a way I can do this using the existing spread function or do I have to use something different?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Pryore
  • 510
  • 9
  • 22

2 Answers2

2

You would need to create a new column first

library(tidyr)
library(dplyr)
data %>% 
  mutate(spread_col = "Y") %>% 
  spread(Service, spread_col, fill = "-")
#  ID Online Shop
#1  A      Y    Y
#2  B      -    Y
#3  C      Y    -
#4  D      Y    Y

(This might be a duplicate of How to reshape data from long to wide format?)


You could also use dcast from data.table or reshape2:

reshape2::dcast(
  data,
  ID ~ Service,
  fun.aggregate = function(x) replace(x, x == x, "Y"),
  fill = "-"
)
markus
  • 25,843
  • 5
  • 39
  • 58
  • Sorry just one problem I'm finding now: I noticed I still have duplicate rows in my dataset, e.g. there are two rows for ID "A" and they are not merged. I keep getting Error: Duplicate identifiers for rows. Any ideas? – Pryore Oct 29 '18 at 09:46
  • 1
    No worries, I've just figured out that my main dataset had a couple of exact-match rows. Thank you for your help anyway :) – Pryore Oct 29 '18 at 09:59
0

You could do this with base.

data: (use a factor variable)

data <- data.frame(
    ID = c("A","A","B","C","D","D"),
    Service = factor(c("Shop","Online","Shop","Online","Online","Shop")), levels = c("Online","Shop"))

code:

ans<-
do.call(
    rbind, tapply(data$Service, data$ID, table)
)

ans[ans == 1] = "Y"
ans[ans == 0] = "-"

result:

#> ans
#  Online Shop
#A "Y"    "Y" 
#B "-"    "Y" 
#C "Y"    "-" 
#D "Y"    "Y" 
#> 
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69