1

I need to create new columns based on if a user has completed an action at least once.

 USER ACTION
 A    Attack
 A    Jump
 B    Attack
 B    Die
 C    Attack
 C    Die
 C    Jump
 D    Die

Desired result would be something like:

 ## If ACTION == something
 ## Create new column and apply '1' for that user for all rows 

 USER ACTION HAS_DIED HAS_JUMPED HAS_ATTACKED
 A    Attack    0         1            1
 A    Jump      0         1            1
 B    Attack    1         0            1
 B    Die       1         0            1
 C    Attack    1         1            1
 C    Die       1         1            1
 C    Jump      1         1            1
 D    Die       1         0            0

So I can end up with a unique USER list

 USER  HAS_DIED HAS_JUMPED HAS_ATTACKED
 A       0         1            1
 B       1         0            1
 C       1         1            1
 D       1         0            0

I've been using the method below of filtering and merging for each feature, but that's getting cumbersome with large amounts of features. Ex)

 ## mark logs of deaths 
 df[ACTION == "Die", HAS_DIED := 1] 

 ## get unique list of users that have died 
 died_df <- df[HAS_DIED == 1]

 ## merge and change none 1s to 0s 
 merged_df <- died_df[df, on = "USER"]
 merged_df$HAS_DIED[is.na(merged_df$HAS_DIED)] <- 0

Looking for a faster and more efficient way to do this!

ant
  • 565
  • 1
  • 6
  • 12

2 Answers2

2

As the initial object is data.table, we can use dcast from data.table and it is very efficient as well

library(data.table)
setnames(dcast(setDT(df1), USER ~ACTION, length), -1, 
         c('HAS_ATTACKED', 'HAS_DIED', 'HAS_JUMPED'))[]
#    USER HAS_ATTACKED HAS_DIED HAS_JUMPED
#1:    A            1        0          1
#2:    B            1        1          0
#3:    C            1        1          1
#4:    D            0        1          0
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks, this is exactly what i needed, I always forget how powerful dcast is! The col names didn't really need to be changed so I'm going to use dcast(df, USER~ACTION, function(x) 1, fill = 0). – ant Jun 06 '17 at 06:17
1

Using dplyr and tidyr:

df %>% 
  mutate(n=1) %>% 
  spread(ACTION, n, fill=0) %>%
  setNames(c('USER', 'HAS_ATTACKED', 'HAS_DIED', 'HAS_JUMPED'))

#   USER HAS_ATTACKED HAS_DIED HAS_JUMPED
# 1    A            1        0          1
# 2    B            1        1          0
# 3    C            1        1          1
# 4    D            0        1          0
Adam Quek
  • 6,973
  • 1
  • 17
  • 23