1

I currently have wide data which looks similar to this:

cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
1   5    0  1  2   4   4   3   0.765

etc

And I wish into a long data frame which looks similar to this:

cid dyad f op ed junk  id
1   2    0 2  5  0.876 1
1   2    0 4  7  0.876 2
1   5    0 2  4  0.765 1
1   5    1 4  3  0.765 2 

I have tried using the gather() function as well as the reshape() function but cannot figure out how to create multiple columns instead of collapsing all of the columns into a long style

All help is appreciated

James Todd
  • 113
  • 1
  • 3
  • 12
  • 2
    A hint would be `melt` from `data.table` – akrun Dec 21 '17 at 15:46
  • 4
    I _think_ this may be the canonical answer here: [Reshaping multiple sets of measurement columns (wide format) into single columns (long format)](https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement-columns-wide-format-into-single-columns) – Henrik Dec 21 '17 at 15:52
  • @akrun hwo do you use the melt() function in this instance? – James Todd Dec 21 '17 at 16:11
  • @Henrik I don't seem to be able to replicate the code for my dataset – James Todd Dec 21 '17 at 16:12
  • For getting what @Henrik suggested to work you may need to bind to two replicates of `junk` column to have the same size of variable columns in `reshape`. – M-- Dec 21 '17 at 16:51
  • `library(data.table); melt(as.data.table(dat), measure.vars = patterns("^f", "^op", "^ed"))`, to make the hint more explicit. – A5C1D2H2I1M1N2O1R2T1 Dec 21 '17 at 17:02

2 Answers2

4

You can use the base reshape() function to (roughly) simultaneously melt over multiple sets of variables, by using the varying parameter and setting direction to "long".

For example here, you are supplying a list of three "sets" (vectors) of variable names to the varying argument:

dat <- read.table(text="
cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
1   5    0  1  2   4   4   3   0.765
", header=TRUE)

reshape(dat, direction="long", 
        varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")), 
        v.names=c("f","op","ed"))

You'll end up with this:

    cid dyad  junk time f op ed id
1.1   1    2 0.876    1 0  2  5  1
2.1   1    5 0.765    1 0  2  4  2
1.2   1    2 0.876    2 0  4  7  1
2.2   1    5 0.765    2 1  4  3  2

Notice that two variables get created, in addition to the three sets getting collapsed: an $id variable -- which tracks the row number in the original table (dat), and a $time variable -- which corresponds to the order of the original variables that were collapsed. There are also now nested row numbers -- 1.1, 2.1, 1.2, 2.2, which here are just the values of $id and $time at that row, respectively.

Without knowing exactly what you're trying to track, hard to say whether $id or $time is what you want to use as the row identifier, but they're both there.

Might also be useful to play with the parameters timevar and idvar (you can set timevar to NULL, for example).

reshape(dat, direction="long", 
        varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")), 
        v.names=c("f","op","ed"), 
        timevar="id1", idvar="id2")
lefft
  • 2,065
  • 13
  • 20
0

The tidyr package can solve this problem using the function gather, separate and spread:

df<-read.table(header=TRUE, text="cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
               1   5    0  1  2   4   4   3   0.765")

library(tidyr)

print(df %>%gather( name, value, -c(cid, dyad, junk)) %>% 
  separate( name, into=c("name", "id"), sep= -2 ) %>%
  spread( key=c(name), value)
)


#step by step:
  #collect the columns f, op, ed to the common cid, dyad and junk
df<-gather(df, name, value, -c(cid, dyad, junk))
  #separate the number id from the names
df<-separate(df, name, into=c("name", "id"), sep= -2 )
  #made wide again.
df<-spread(df, key=c(name), value)
Dave2e
  • 22,192
  • 18
  • 42
  • 50