0

I have a file that basically looks like this.

1  A 
2  A 
2  B 
3  A 
3  B 
3  C 
4  A 
4  C 
...

I would like to have a file like this

1  A 
2  A  B 
3  A  B  C 
4  A  C 
...

I tried using the reshape tool in R, but it didn't work...

reshape(df, idvar = V1, timevar = V2, direction = "wide")

gave the following error:

In reshapeWide(data, idvar = idvar, timevar = timevar,  ... :  multiple rows match for V2=A: first taken 
In reshapeWide(data, idvar = idvar, timevar = timevar,  ... :  multiple rows match for V2=B: first taken   
In reshapeWide(data, idvar = idvar, timevar = timevar,  ... :  multiple rows match for V2=C: first taken 

A solution in R or linux is highly appreciated. Thank you!

r2evans
  • 141,215
  • 6
  • 77
  • 149
LVE
  • 15
  • 4
  • Do you intentionally *not* quote `V1` and `V2`? Do you have variables in your working environment with those names? – r2evans Nov 27 '18 at 16:09

1 Answers1

0
df <- read.table(header=FALSE, stringsAsFactors=FALSE, text="
1  A 
2  A 
2  B 
3  A 
3  B 
3  C 
4  A 
4  C ")

Method 1: dplyr

library(dplyr)
library(tidyr)
df %>%
  group_by(V1) %>%
  mutate(rn = row_number()) %>%
  spread(rn, V2)
# # A tibble: 4 x 4
# # Groups:   V1 [4]
#      V1 `1`   `2`   `3`  
#   <int> <chr> <chr> <chr>
# 1     1 A     <NA>  <NA> 
# 2     2 A     B     <NA> 
# 3     3 A     B     C    
# 4     4 A     C     <NA> 

Method 2: data.table

library(data.table)
DT <- as.data.table(df)[,rn := seq_len(.N),by="V1"]
dcast(DT, V1 ~ rn, value.var = "V2")
#    V1 1    2    3
# 1:  1 A <NA> <NA>
# 2:  2 A    B <NA>
# 3:  3 A    B    C
# 4:  4 A    C <NA>
r2evans
  • 141,215
  • 6
  • 77
  • 149