Consolidate rows by group value

Question

Trying to simplify a huge and redundant dataset and would like your help with moving cells around so each row is a different "group" according to the value in column 1, with added columns for each unique OLD row cell/element that matches that group value. See below.

What I have:

What I want:

col1 col2 col3 col4 col5 col6
1    a    b    c    d    N/A
2    a    N/A  c    d    e
3    a    b    N/A  d    e

I hope this isn't too vague but I will update this question as soon as get notification of replies.

Thanks in advance!

`library(tidyr); spread(x, col2, col2)` – bergant Sep 12 '15 at 00:14 — bergant, Sep 12 '15 at 00:14

score 1 · Answer 1 · answered Sep 12 '15 at 01:33

1

We could use dcast from library(reshape2) to convert from 'long' to 'wide' format. By default, it will take the value.var='col2'. If there are more columns, we can explicitly specify the value.var.

library(reshape2)
dcast(df1, col1~ factor(col2, labels=paste0('col', 2:6)))
#  col1 col2 col3 col4 col5 col6
#1    1    a    b    c    d <NA>
#2    2    a <NA>    c    d    e
#3    3    a    b <NA>    d    e

answered Sep 12 '15 at 01:33

akrun

874,273
37
540
662

1

Ah, thank you for this. Didn't know there was a reshape library that can help me go from long to wide format, will try it soon – Jordan Garner Sep 13 '15 at 06:14

small_data88 · Answer 2 · 2015-09-12T01:52:45.483

Here is another way that uses reshape of the stats package,

x<-data.frame(col1 = c(1,1,1,1,2,2,2,2,3,3,3,3), 
col2 = c('a','b','c','d','a','c','d', 'e', 'a', 'b', 'd', 'e'))

x<-reshape(x, v.names="col2", idvar="col1", timevar="col2", direction="wide")
names(x)<-c('col1', 'col2', 'col3', 'col4', 'col5', 'col6')

Output:

    col1 col2 col3 col4 col5 col6
1    1    a    b    c    d <NA>
5    2    a <NA>    c    d    e
9    3    a    b <NA>    d    e

Consolidate rows by group value

2 Answers2