-1

let's say

df:
user  actions
 1      A
 1      B
 1      c
 2      A
 2      D
 3      B
 4      C
 4      D

I want to convert to this format

new_df:
user  action1 action2 action3 
 1      A       B       C
 2      A       D       NA
 3      B       NA      NA
 4      C       D       NA

please note that the number of columns in new_df is equal to the max number of actions among users. it should insert NA if the user is less that max actions. how can I do it?

Cina
  • 9,759
  • 4
  • 20
  • 36

2 Answers2

2

You can use rle to create a column to store action1, action2, etc. Then use dcast from data.table package to turn the data into a wide format.

df$coln <- paste0("actions", unlist(lapply(rle(df$user)$lengths, seq_len)))
data.table::dcast(df, user ~ coln,  value.var="actions")

In response to OP's comment, you can pad the beginning with 0 as follows:

df$coln <- paste0("actions", sprintf("%02d", unlist(lapply(rle(df$user)$lengths, seq_len))))

Using data.table package:

df <- read.table(text="user  actions
1      A
1      B
1      C
1      D
1      E
1      F
1      G
1      H
1      I
1      J
1      K
2      A
2      D
3      B
4      C
4      D", header=TRUE)

library(data.table)
setDT(df)
dcast(setDT(df)[, coln := sprintf("actions%02d", seq_len(.N)), by=.(user)], 
    user ~ coln,  value.var="actions")
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
  • it works thx. but small issue. it sorts columns like `action1, action11,action12... action2,action21...` How can I fix the column arrangement? – Cina Apr 12 '18 at 04:16
  • i have so many actions in my real dataframe – Cina Apr 12 '18 at 04:17
1

A solution using tidyverse approach

df <- read.table(text = "user  actions
 1      A
 1      B
 1      c
 2      A
 2      D
 3      B
 4      C
 4      D", header = TRUE)

library(tidyr)
library(dplyr)

df %>% 
  group_by(user) %>% 
  mutate(index = paste0("action", row_number())) %>% 
  spread(index, actions) 

#> # A tibble: 4 x 4
#> # Groups:   user [4]
#>    user action1 action2 action3
#>   <int> <fct>   <fct>   <fct>  
#> 1     1 A       B       c      
#> 2     2 A       D       <NA>   
#> 3     3 B       <NA>    <NA>   
#> 4     4 C       D       <NA>

Created on 2018-04-11 by the reprex package (v0.2.0).

Tung
  • 26,371
  • 7
  • 91
  • 115
  • it give me an error: 'Error: 'group_vars' is not an exported object from 'namespace:dplyr'' – Cina Apr 12 '18 at 04:04
  • What're the version of `dplyr` & `tidyr` you are using? You might need to restart `R/Rstudio` then run `install.packages(c("dplyr", "tidyr"), dependencies=TRUE)` to update those packages – Tung Apr 12 '18 at 04:06