split columns according to sequence numbers

Question

I have a dataset like this:

seq X
1   a
2   b
3   c
1   d
2   e
1   f
2   g
3   h
4   i
5   j

And I would like to split/group the columns according to the assigned seq, like this:

seq X    seq1  X1   seq2 X2
1   a    1     d    1    f
2   b    2     e    2    g
3   c    NA    NA   3    h
NA  NA   NA    NA   4    i
NA  NA   NA    NA   5    j

Thank you in advance

score 0 · Answer 1 · edited May 23 '17 at 12:16

0

We need to split the data frame first and apply a custom function that merges unequal data frames, i.e.

do.call(cbindPad, split(df, cumsum(df$seq == 1)))

#  1.seq  1.X 2.seq  2.X 3.seq 3.X
#1     1    a     1    d     1   f
#2     2    b     2    e     2   g
#3     3    c    NA <NA>     3   h
#4    NA <NA>    NA <NA>     4   i
#5    NA <NA>    NA <NA>     5   j

where cbindpad was taken by @joran answer at this post

edited May 23 '17 at 12:16

Community

1
1

answered Jan 04 '17 at 15:44

Sotos

51,121
6
32
66

1

great piece of code... I was stuck at how to `split` the data. Sorry that i can't upvote this since my limits have passed. – joel.wilson Jan 04 '17 at 16:10
I was thinking, could we avoid `cbindPad`. Even i was trying on that – joel.wilson Jan 04 '17 at 16:19
@joel.wilson but how will you bind them then? Unless you mean leaving them in a list (which is not what the OP wants) – Sotos Jan 04 '17 at 16:29
would you check my steps? would this be feasible? – joel.wilson Jan 04 '17 at 16:33

score 0 · Answer 2 · answered Jan 04 '17 at 16:32

0

this was just for exploration, @Sotos something to this kind would work? bdw this has lots of transposing which is not efficient

df1 = split(df, cumsum(df$seq == 1))
df2 = lapply(df1 , function(x) as.data.frame(t(x)))
#$`1`
#    V1 V2 V3
#seq  1  2  3
#X    a  b  c

#$`2`
#    V1 V2
#seq  1  2
#X    d  e

#$`3`
#    V1 V2 V3 V4 V5
#seq  1  2  3  4  5
#X    f  g  h  i  j

data.frame(t(rbind.fill(df2)))
#     X1   X2   X3   X4 X5 X6
#V1    1    a    1    d  1  f
#V2    2    b    2    e  2  g
#V3    3    c <NA> <NA>  3  h
#V4 <NA> <NA> <NA> <NA>  4  i
#V5 <NA> <NA> <NA> <NA>  5  j

answered Jan 04 '17 at 16:32

joel.wilson

8,243
5
28
48

Yes of course. You are just following another method from the post I took `cbindpad` from. A lot of `t` which makes it a bit heavy but it does get to the core of the problem – Sotos Jan 04 '17 at 16:44
@Sotos i actually didn't check those answers. but anyway – joel.wilson Jan 04 '17 at 16:47
What I meant was that you too had to follow the "fill_withNA" type of process – Sotos Jan 04 '17 at 16:50
Excellent! Thank you for your help:) – Jan 04 '17 at 18:26

split columns according to sequence numbers

2 Answers2

Linked