0

I'd really appreciate some help getting this messy set of new survey data into a usable form. It was collected in a strange way and now I've got strange data to work with. I've looked through tidyr and used those approaches to no end. I suspect my problem is that I'm thinking about this dataset all wrong and I'm blind to some real answer. But given all the things I need to do to this df, I cant figure out where to start and thus where to start googling.

What I need: For each person to be their own row Each person retains their GroupID and Treated value For the variables currently attached to each person individually to become columns (age, weight, height)

Fake (and much smaller):

structure(list(GroupID = 1:5, Treated = c("Y", "Y", "N", "Y", 
"N"), person1_age = c(45L, 33L, 71L, 19L, 52L), person1_weight = c(187L, 
145L, 136L, 201L, 168L), person1_height = c(69L, 64L, 51L, 70L, 
66L), person2_age = c(54L, 20L, 48L, 63L, 26L), person2_weight = c(140L, 
122L, 186L, 160L, 232L), person2_height = c(62L, 70L, 65L, 72L, 
74L), person3_age = c(21L, 56L, 40L, 59L, 67L), person3_weight = c(112L, 
143L, 187L, 194L, 159L), person3_height = c(61L, 69L, 73L, 63L, 
72L)), .Names = c("GroupID", "Treated", "person1_age", "person1_weight", 
"person1_height", "person2_age", "person2_weight", "person2_height", 
"person3_age", "person3_weight", "person3_height"), row.names = c(NA, 
5L), class = "data.frame")

Any help or further readings you could point me to would be very much appreciated.

EpiOregon
  • 19
  • 3

2 Answers2

1

reshape can do this, with the appropriate arguments:

> reshape(x, direction="long", varying=names(x)[3:11], timevar='person', v.names=c('height', 'age', 'weight'), sep='_')
    GroupID Treated person height age weight id
1.1       1       Y      1    187  45     69  1
2.1       2       Y      1    145  33     64  2
3.1       3       N      1    136  71     51  3
4.1       4       Y      1    201  19     70  4
5.1       5       N      1    168  52     66  5
1.2       1       Y      2    140  54     62  1
2.2       2       Y      2    122  20     70  2
3.2       3       N      2    186  48     65  3
4.2       4       Y      2    160  63     72  4
5.2       5       N      2    232  26     74  5
1.3       1       Y      3    112  21     61  1
2.3       2       Y      3    143  56     69  2
3.3       3       N      3    187  40     73  3
4.3       4       Y      3    194  59     63  4
5.3       5       N      3    159  67     72  5

This relies on the order of the columns in your original data, for the varying argument, being in increasing order in the original data.

If that's not the case, specify varying manually. Here's what is used above:

> names(x)[3:11]
[1] "person1_age"    "person1_weight" "person1_height" "person2_age"    "person2_weight" "person2_height"
[7] "person3_age"    "person3_weight" "person3_height"
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
0

We can also use melt from data.table which can take multiple patterns in the measure argument

library(data.table)
melt(setDT(x), measure = patterns("age$", "weight$", "height$"), 
     variable.name = "person", value.name = c("age", "weight", "height"))
#     GroupID Treated person age weight height
# 1:       1       Y      1  45    187     69
# 2:       2       Y      1  33    145     64
# 3:       3       N      1  71    136     51
# 4:       4       Y      1  19    201     70
# 5:       5       N      1  52    168     66
# 6:       1       Y      2  54    140     62
# 7:       2       Y      2  20    122     70
# 8:       3       N      2  48    186     65
# 9:       4       Y      2  63    160     72
#10:       5       N      2  26    232     74
#11:       1       Y      3  21    112     61
#12:       2       Y      3  56    143     69
#13:       3       N      3  40    187     73
#14:       4       Y      3  59    194     63
#15:       5       N      3  67    159     72
akrun
  • 874,273
  • 37
  • 540
  • 662