1

I have a dataset in a "long" format that I want to change to a "wide" format. I want to group by a set of columns and group the remaining columns into corresponding pairs. I think I know to how to long to wide when it is just one column being 'widened', but I can't get it to work when I need multiple long widened at the same time.

Please look at the examples for the desired starting and end point.

STARTING:

   structure(list(gender = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L
   ), .Label = c("female", "male"), class = "factor"), state =     structure(c(3L, 
   3L, 3L, 1L, 1L, 1L, 2L), .Label = c("ca", "ny", "tx"), class = "factor"), 
name = structure(c(3L, 5L, 7L, 6L, 1L, 2L, 4L), .Label = c("ashley", 
"jackie", "john", "luke", "mark", "mary", "rob"), class = "factor"), 
value = c(1L, 2L, 3L, 1L, 2L, 3L, 1L)), .Names = c("gender", 
 "state", "name", "value"), class = "data.frame", row.names = c(NA, 
-7L))

ENDING:

  structure(list(gender = structure(c(2L, 1L, 2L), .Label = c("female", 
"male"), class = "factor"), state = structure(c(3L, 1L, 2L), .Label = c("ca", 
"ny", "tx"), class = "factor"), value1 = c(1L, 1L, 1L), name1 = structure(c(1L, 
3L, 2L), .Label = c("john", "luke", "mary"), class = "factor"), 
value2 = c(2L, 2L, NA), name2 = structure(c(2L, 1L, NA), .Label = c("ashley", 
"mark"), class = "factor"), value3 = c(3L, 3L, NA), name3 = structure(c(2L, 
1L, NA), .Label = c("jackie", "rob"), class = "factor")), .Names = c("gender", 
"state", "value1", "name1", "value2", "name2", "value3", "name3"
), class = "data.frame", row.names = c(NA, -3L))
runningbirds
  • 6,235
  • 13
  • 55
  • 94

1 Answers1

1

We can use dcast from data.table which can take multiple 'value.var' columns. We convert the data.frame to data.table (setDT(df1)), create a sequence column ("N"), grouped by "gender" and "state". Then, use dcast to convert from 'long' to 'wide' format.

library(data.table)
setDT(df1)[, N:= 1:.N, .(gender,state)]
dcast(df1, gender+state~N, value.var=c("name", "value"), sep="")
#    gender state name1  name2  name3 value1 value2 value3
#1: female    ca  mary ashley jackie      1      2      3
#2:   male    ny  luke     NA     NA      1     NA     NA
#3:   male    tx  john   mark    rob      1      2      3

This can be also done with reshape from base R after creating the sequence column

dfN <- transform(df1, N = ave(seq_along(state), 
                       gender, state, FUN=seq_along))
reshape(dfN, idvar=c('gender', 'state'), timevar= 'N', 
               direction='wide')
#  gender state name.1 value.1 name.2 value.2 name.3 value.3
#1   male    tx   john       1   mark       2    rob       3
#4 female    ca   mary       1 ashley       2 jackie       3
#7   male    ny   luke       1   <NA>      NA   <NA>      NA
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    awesome, couldn't find any examples on how to do this or figure it out with the multi column, long>>wide. thanks!!!!!!!!! – runningbirds Jan 24 '16 at 06:21