group unique rows in a dataframe by ID, and stack related values as new column

Question

I'm trying to transform a data frame in which individuals are grouped by family ID. Instead of:

FAMID1 ID1  Age  Sex  Q1  Q2 ......
21     1    18   M    T   4
21     2    21   F    F   2
22     1    13   F    T   3
22     2    16   F    F   1
23     1    18   M    T   3
23     2    18   M    T   3

I want to get:

FAMID  ID1  ID2  Age1  Age2  Sex1  Sex2 Q1.1. Q1.2....
21     1    2    18    21     M     F    T     F
22     1    2    13    16     F     F    T     F
23     1    2    18    18     M     M    T     T

I've tried using melt (reshape2) and mutate (ddply) but I have a few hundred variables and I can't get them all into the same line of the data frame. I don't want to sum variables like aggregate, I just want to stack one line in with another and change all of the variable names to reflect which member of the family it is.

You'll probably need to give a concrete example that can be reproduced easily. Some guidance: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 Without one, mostly we could say "look at `melt` et al", but you already know about those. Maybe that's what this is, but the `...` make it look pseudo. — Frank, Dec 20 '16 at 18:21
Sorry, new to stack overflow. What do you mean by concrete example? — Dan Lordan, Dec 20 '16 at 18:24
Sorry, yeah, I meant that it should be easily reproduced (preferably possible via copy paste into an R console) and that the desired output should correspond precisely to the input. (The latter may hold true here, but it's not obvious to me, since you have added ellipses that make this look like a sketch corresponding to some larger example with other columns not explicitly listed.) — Frank, Dec 20 '16 at 18:36
Did you type your expected output wrong? Looks like Q2.2 column supposed to be Q1.2 ? — user5249203, Dec 20 '16 at 19:25

score 1 · Answer 1 · answered Dec 20 '16 at 18:51

1

reshape(direction = "wide", data=df, idvar = c("FAMID1"), 
        v.names = c("ID","Age","Sex","Q1","Q2"), timevar = "ID")

#   FAMID1 ID.1 Age.1 Sex.1 Q1.1 Q2.1 ID.2 Age.2 Sex.2  Q1.2 Q2.2
#1:     21    1    18     M TRUE    4    2    21     F FALSE    2
#2:     22    1    13     F TRUE    3    2    16     F FALSE    1
#3:     23    1    18     M TRUE    3    2    18     M  TRUE    3

answered Dec 20 '16 at 18:51

joel.wilson

8,243
5
28
48

Is there a way to continue the v.names out to the end of the data set? I.e. could you make the second line v.names = c ("ID" : "Q2")? I've got a few hundred variables so would like to shorten if possible - that's where I'm running into trouble. – Dan Lordan Dec 20 '16 at 19:35
@DanLordan why don't you replace it with `v.names = colnames(df)[-1]` – joel.wilson Dec 20 '16 at 19:41

user5249203 · Answer 2 · 2016-12-20T20:04:37.617

You can achieve this using data.table long to wide. various other examples are listed here.

library(data.table)
setDT(df)

if you have many columns, you can just do this

colnames(df)
cols<-colnames(df)[-1]
dcast(df, FAMID1 ~ ID1, value.var = cols)

output

   FAMID1 ID1.1_1 ID1.1_2 Age_1 Age_2 Sex_1 Sex_2  Q1_1  Q1_2  Q2_1 Q2_2
 1:   21     1       2      18    21     M     F   TRUE FALSE    4    2
 2:   22     1       2      13    16     F     F   TRUE FALSE    3    1
 3:   23     1       2      18    18     M     M   TRUE  TRUE    3    3

group unique rows in a dataframe by ID, and stack related values as new column

2 Answers2