1

I have two data frames; mRNA (here) and RPPA(here). The mRNA data frame has 1,212 columns, while the RPPA data frame has 937 columns. All columns names in the RPPA data frame appear also in the mRNA data frame (but not in the same order). Within the columns, the values are different between the two data frames.
I want to create a new mRNA data frame, which will contain the same columns as the RPPA data frame, and will not contain the columns that do not appear in the ("old") mRNA data frame.
An example:

mRNA <- data.frame(A=c(25,76,23,45), B=c(56,89,12,452), C=c(45,456,243,5), D=c(13,65,23,16), E=c(17:20), F=c(256,34,0,5))  
RPPA <- data.frame(B=c(46,47,45,49), A=c(51,87,34,87), D=c(76,34,98,23))  

The expected result would be:

> new.mRNA
B     A     D
56    25    13
89    76    65
12    23    23
452   45    16

I've tried converting the RPPA column names into a vector, and than use it with the command mRNA[col.names.vector], as described here, but it doesn't work. It gives the error undefined columns selected.

Is there a quick way to do it (without functions, loops etc.)?

Community
  • 1
  • 1
Debby
  • 29
  • 7
  • Please check if you have leading/lagging spaces in your column names – akrun Jan 15 '17 at 18:45
  • @akrun sorry, I'm really new to r. how do I check this? – Debby Jan 16 '17 at 11:49
  • @deborah It is easy to check. `colnames(mRNA); colnames(RPPA)` – akrun Jan 16 '17 at 12:19
  • @akrun I don't think I have spaces, but I do have numerous dots. example of column name: **TCGA.3C.AALI.01A.21.A43F.20**. Is that a problem? – Debby Jan 16 '17 at 12:31
  • It could be a problem. Check whether you have the same dots in both of the dataset colum names – akrun Jan 16 '17 at 12:32
  • Yes, the dots are the same. I've added a link to the files, if you could maybe view it it would be very helpful. – Debby Jan 16 '17 at 12:44

4 Answers4

1

Both of the answers that were posted didn't work for my data. Thanks to both answers posted, and with a little more research, I figured out the answer: First, you need to generate a vector that will include ONLY the column names that appear in BOTH data frames. In order to do that I used the command intersect and Reduce:

target <- Reduce(intersect, list(colnames(raw.mRNA), colnames(RPPA)))

Now you can use the answer that was given:

new.mRNA <- mRNA[target]

and this will generate a new data frame with the right values.
Thank you @akrun and @Titolondon for your help

Debby
  • 29
  • 7
1

You can find the dissimilar columns in two data frames as per the below code.

col_name=colnames(mRNA[which(!(colnames(mRNA) %in% colnames(RPPA)))])

new_mRNA=mRNA %>% select(-col_name)
0

We can subset the mRNA by the column names of 'RPPA' and assign it to 'RPPA'

RPPA[] <- mRNA[names(RPPA)]
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Subset of a data.frame with a vector should have work.

  1. Create a vector of the column name you want to keep
  2. Subset you data.frame using this vector


mRNA <- data.frame(A=c(25,76,23,45), B=c(56,89,12,452), C=c(45,456,243,5), D=c(13,65,23,16), E=c(17:20), F=c(256,34,0,5))  
RPPA <- data.frame(B=c(46,47,45,49), A=c(51,87,34,87), D=c(76,34,98,23))  

mRNA
#>    A   B   C  D  E   F
#> 1 25  56  45 13 17 256
#> 2 76  89 456 65 18  34
#> 3 23  12 243 23 19   0
#> 4 45 452   5 16 20   5
RPPA
#>    B  A  D
#> 1 46 51 76
#> 2 47 87 34
#> 3 45 34 98
#> 4 49 87 23
mRNA[, names(RPPA)]
#>     B  A  D
#> 1  56 25 13
#> 2  89 76 65
#> 3  12 23 23
#> 4 452 45 16
cderv
  • 6,272
  • 1
  • 21
  • 31
  • How is this answer different from mine? – akrun Jan 15 '17 at 17:42
  • Thanks for the quick reply. Actually, both answers are not yet what I'm looking for. When I'm doing akrun's answer, all values in the RPPA data frame are becoming NA's. When I do Titolondon's answer, I get again the error described above (undefined columns selected). – Debby Jan 15 '17 at 17:48
  • @deborah I couldn't reproduce your NA's based on the example you provided. I am getting the expected output as you showed – akrun Jan 15 '17 at 17:58
  • @akrun The row names are different between mRNA and RPPA. Could that be the reason? – Debby Jan 15 '17 at 18:03
  • @deborah Please try the code on the example you only posted – akrun Jan 15 '17 at 18:04
  • @akrun It is working with the example, but not with my actual data... I can't figure what's the problem – Debby Jan 15 '17 at 18:44