7

I have two datasets both of the same size [132,450000]. One with values and another with p-values corresponding to those values. Now I want to combine those two datasets so that I have 1 large dataframe [264,450000] with the column with values followed by the column with the corresponding p-values. The rownames are exactly the same and the column names are like: sample1 in df1 and sample1_pval in df2

For example I have two dataframes likes this

> df1
    x y
cg1 1 a
cg2 2 b
cg3 3 c
cg4 4 d
cg5 5 e

> df2
     x_pval y_pval 
cg1   6      f
cg2   7      g
cg3   8      h
cg4   9      i
cg5  10      j

And I want to merge them with this order: 1st column of df1 followed by 1st column of df2 followed by 2nd column of df1 followed by 2nd column of df2 etc etc.....

So then it will look like this:

> df
           x       x_pval    y        y_pval
cg1        1        6        a        f
cg2        2        7        b        g
cg3        3        8        c        h
cg4        4        9        d        i
cg5        5       10        e        j

I want to keep the columnnames but the rownames I can add later since they are the same in both dataframes. Since I'm working with a large dataset I dont want to type in all the columns and use "cbind". And I couldn't find a code for "merge" that does one column from each dataset at a time......

Is there a formula or package that does this?

Anyone who can help me out?

Sotos
  • 51,121
  • 6
  • 32
  • 66

4 Answers4

5

Another option would be to concatenate the sequence of columns of both datasets, order and then cbind

cbind(df1, df2)[order(c(seq_along(df1), seq_along(df2)))]
#    x x_pval y y_pval
#cg1 1      6 a      f
#cg2 2      7 b      g
#cg3 3      8 c      h
#cg4 4      9 d      i
#cg5 5     10 e      j
akrun
  • 874,273
  • 37
  • 540
  • 662
1

One idea is to cbind the data frames and order on column names prefixes, i.e.

dd <- cbind(df1, df2)
dd[order(sub('_.*', '', names(dd)))]

which gives,

    x x_pval y y_pval
cg1 1      6 a      f
cg2 2      7 b      g
cg3 3      8 c      h
cg4 4      9 d      i
cg5 5     10 e      j

If your columns are always structured as your example then this will also work,

data.frame(dd[c(TRUE, FALSE)], dd[c(FALSE, TRUE)]) #dd taken from above
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Both these options seems to put them together in blocks. The first one in blocks of 12 and the second in blocks of 66. Thanks for your help anyways! it is solved now :) – Fleur Peters Sep 12 '17 at 10:50
1

You can reorder th columns using %%, to give you the odd/even column numbers.

EDIT.
This idea turned out to be bad, the original code was buggy beyond repair. After I saw the solution by @Sotos using order and his comment on the use of c(TRUE, FALSE) I decided to post a solution that works, inspired but not equal to that one. Here it is.

altern <- function(m, n){
    order(c(which(rep(c(TRUE, FALSE), m)), which(rep(c(FALSE, TRUE), n))))
}

df3 <- cbind(df1, df2)
n <- seq_along(names(df3))
df3 <- df3[, n[altern(ncol(df1), ncol(df2))]]
df3

As you can see, this solution is the most complicated of all.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • This seems to put them together in blocks. I first get 66 columns of values and then 66 columns of pvalues. Thanks for your help :) it is solved now – Fleur Peters Sep 12 '17 at 10:48
  • @FleurPeters I works with the sample data you posted. Are you sure you're running the 3rd, last line of code? – Rui Barradas Sep 12 '17 at 10:52
  • Yes i know, I guess my example is just not really good, sorry about that. The data has the dimensions [132,450000] so i wanted to make a simple example instead of posting all that. If you make an example with a df with 4 columns your code also doesn't work. Maybe just with df with 2 columns..... – Fleur Peters Sep 12 '17 at 11:47
1

Or if you want to avoid sorting:

cbind(df1, df2)[rep(seq_along(df1),each=2)+rep(c(0,ncol(df1)),ncol(df1))]
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167