0

I have a data frame df in R of the form

a,1
a,4
a,2
b,6
b,8
b,4
c,4
c,5
c,2

I want to represent df in the form

a,1,4,2
b,6,8,4
c,4,5,2

What is a faster way to do this transformation in R, especially if my data frame was of a larger size??

hearse
  • 379
  • 2
  • 4
  • 23
  • This is really a [long to wide dupe](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format), but without column names or a sufficiently unique index. It's all fixable, e.g. `library(tidyverse); df %>% set_names(paste0('V', 1:2)) %>% group_by(V1) %>% mutate(var = paste0('X', row_number())) %>% spread(var, V2)` – alistaire Aug 08 '17 at 21:38

1 Answers1

1

By using dplyr and reshape2

library(dplyr)
library(reshape2)
dat=dat%>%group_by(V1)%>%dplyr::mutate(id=row_number())
as.data.frame(acast(dat, V1~id,value.var="V2"))

  1 2 3
a 1 4 2
b 6 8 4
c 4 5 2

Data input :

dat
  V1 V2
1  a  1
2  a  4
3  a  2
4  b  6
5  b  8
6  b  4
7  c  4
8  c  5
9  c  2

EDIT : Timing

library(microbenchmark)
microbenchmark(
    acastmethod=acast(dat, a~id,value.var="b"), 
    dcastmethod=dcast(dat, a ~ id , value.var = "b"),
    tidyrmethod=spread(dat, key = id, value = b),
    xtabmethod=xtabs(b ~ a + id, data = dat)

)


Unit: milliseconds
        expr      min       lq     mean   median       uq       max neval  cld
 acastmethod 1.872223 2.035528 2.237846 2.210701 2.349068  3.783507   100 a   
 dcastmethod 3.124578 3.405817 3.626199 3.579038 3.815807  4.887430   100  b  
 tidyrmethod 4.025684 4.477290 4.765803 4.725326 5.035862  6.140385   100   c 
  xtabmethod 5.054490 5.529382 6.378615 5.714020 6.047391 61.242200   100    d
BENY
  • 317,841
  • 20
  • 164
  • 234