1

I have a dataframe with the following columns:

id condition trial   a0   a1 ... a7   b1   b2 ... b7
 1      1      1     23   56 ... 52   87   56 ... 98
 1      1      2     57   87 ... 87   55   52 ... 52
 1      1      3     23   56 ... 52   87   56 ... 98
 2      1      1     23   56 ... 82   97   56 ... 68

a and b are values measured at times 0, 1 ... 7. Measure a has values at 0 and b has not.

I want to have it in the format:

id condition trial time measureA  measureB
 1     1      1     0       23      NA
 1     1      2     0       57      NA
 1     1      3     0       23      NA
 ...
 1     1      1     1       56      87
 1     1      2     1       87      55 
 ...

In other words, I have a 7 second time series for 2 measures spread over 2*7 columns, and I wanted to "collect" them in a single column. One of them also has a value at time 0.

This is what I have tried so far:

I used the functiongather

x = gather(x, timeA, measureA, -c(1:3, 4:11))
x = gather(x, timeB, measureB, -c(1:3, 10,11))

and could put the dataframe in the following format:

id condition trial    timeA  measureA   timeB  measureB
                         0       12       1       45
                         0       52       1       85
                         1       51       2       55
                         1       45       2       92
                         2       45       3       15
                         2       52       3       54

I wanted to use merge to merge timeA and timeB, but it does not work because the data frame is too big.

I used the code:

merge(x[1:5], x[6:7], by.x="timeA", by.y = "timeB")

I get the error message

Error: cannot allocate vector of size 40980.3 Gb

Is there any way to achieve what I want to do?

Community
  • 1
  • 1
Sininho
  • 287
  • 5
  • 13
  • Please show the code you tried for merging. You probably tried an unintended cartesian join. – Roland May 24 '16 at 12:36
  • I edited the question including the code – Sininho May 24 '16 at 12:39
  • 1
    Your small example already shows that your `timeA` and `timeB` values are not unique. Thus, you get a cartesian join which explains the huge memory demand. I suspect that you do not want this join, but the expected result is unclear. – Roland May 24 '16 at 12:41
  • They are indeed not unique. There is one value of timeA and timeB for each combination of the other variables (id, condition, trial). – Sininho May 24 '16 at 12:44
  • Provide a small reproducible example of what you have and the result you need and it would be easy to help you. As it is, composing a good answer involves too much guesswork. – Roland May 24 '16 at 12:46
  • I tried editing the question with a more complete view of what I have and what I want. I hope it makes it more clear. – Sininho May 24 '16 at 12:58
  • 2
    No, that isn't sufficient. Please read this FAQ: http://stackoverflow.com/a/5963610/1412059 We need something that lets us recreate your problem and shows the corresponding exact expected result. – Roland May 24 '16 at 13:02

1 Answers1

0

I guess you could try using sqldf or data.table. Could you provide more info in your question? sample data? what kind of merge are you trying (left join?)

library(sqldf)
library(data.table)

using sqldf

sqldf = sqldf("SELECT * FROM df1 LEFT OUTER JOIN df2 ON df1.x = df2.x")

using data.table

data.table = dt1[dt2, nomatch = 0L, on = "x",  allow.cartesian=TRUE ]
rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
  • I think this Q just needs `data.table::melt.data.table`. Also the argument is `allow.cartesian`, not `cartesian`. It's better to test the code. – Arun May 24 '16 at 13:06
  • Thank you for the heads up. I'm still trying to understand what exactly is the question. – rafa.pereira May 24 '16 at 13:11
  • OP wants all `a*` columns and `b*` columns collapsed together (in long format). Since `b0` doesn't exist, that needs to be filled with `NA` (corresponding to the column `a0`). – Arun May 24 '16 at 13:13
  • Thanks @Arun. You're very generous. Please feel free to post your answer or to edit mine. :) – rafa.pereira May 24 '16 at 13:16