0

Trying to simplify a huge and redundant dataset and would like your help with moving cells around so each row is a different "group" according to the value in column 1, with added columns for each unique OLD row cell/element that matches that group value. See below.

What I have:

col1 col2
1    a
1    b
1    c
1    d
2    a
2    c
2    d
2    e
3    a
3    b
3    d
3    e

What I want:

col1 col2 col3 col4 col5 col6
1    a    b    c    d    N/A
2    a    N/A  c    d    e
3    a    b    N/A  d    e 

I hope this isn't too vague but I will update this question as soon as get notification of replies.

Thanks in advance!

Jordan Garner
  • 147
  • 1
  • 2
  • 9

2 Answers2

1

We could use dcast from library(reshape2) to convert from 'long' to 'wide' format. By default, it will take the value.var='col2'. If there are more columns, we can explicitly specify the value.var.

library(reshape2)
dcast(df1, col1~ factor(col2, labels=paste0('col', 2:6)))
#  col1 col2 col3 col4 col5 col6
#1    1    a    b    c    d <NA>
#2    2    a <NA>    c    d    e
#3    3    a    b <NA>    d    e
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Ah, thank you for this. Didn't know there was a reshape library that can help me go from long to wide format, will try it soon – Jordan Garner Sep 13 '15 at 06:14
0

Here is another way that uses reshape of the stats package,

x<-data.frame(col1 = c(1,1,1,1,2,2,2,2,3,3,3,3), 
col2 = c('a','b','c','d','a','c','d', 'e', 'a', 'b', 'd', 'e'))

x<-reshape(x, v.names="col2", idvar="col1", timevar="col2", direction="wide")
names(x)<-c('col1', 'col2', 'col3', 'col4', 'col5', 'col6')

Output:

    col1 col2 col3 col4 col5 col6
1    1    a    b    c    d <NA>
5    2    a <NA>    c    d    e
9    3    a    b <NA>    d    e
small_data88
  • 380
  • 1
  • 10