0

I have to data frame as ‘datExpr’ and ‘Clin_dat’. Both of them has 50 rows that all row names are equally one by one but not in the same order. I would like to join them to new data frame based on row names. Clin_dat:

           batch_STF
STT5840_LMS     1
STT5977_LMS     1
STT5980_LMS     2
STT5528_LMS     1
STT516_LMS      2

And datExpr is:

             C9orf152     ELMO2       RPS11      CREB3L1    PNMA1
STT6023_LMS 4.091188    0.8606005   11.954766   3.012608    2.5594348
STT6024_LMS 3.488034    2.0236382   12.183710   3.672799    2.1944313
STT6027_LMS 4.463488    2.3819448   11.080929   6.182108    3.7694720
STT6030_LMS 3.284857    2.2709810   12.034030   9.193188    2.5655489
STT6032_LMS 3.608634    1.4886245   13.277227   6.375075    1.4886245

I use below command :

datExpr_batch <- merge(clin_dat,datExpr,by = "row.names", all=TRUE)

so my expectation is a new data frame by 50 rows and colnames equal names of 2 data frame same as below:

        batch_STF  C9orf152       ELMO2       RPS11      CREB3L1    PNMA1
STT5840_LMS 1       4.091188    0.8606005   11.954766   3.012608    2.5594348
STT5977_LMS 1       3.488034    2.0236382   12.183710   3.672799    2.1944313
STT5980_LMS 2       4.463488    2.3819448   11.080929   6.182108    3.7694720
STT5528_LMS 1       3.284857    2.2709810   12.034030   9.193188    2.5655489
STT516_LMS  2       3.608634    1.4886245   13.277227   6.375075    1.4886245

but datExpr_batch has 88 rows. Same as below:

      Row.names    batch_STF C9orf152    ELMO2         RPS11    CREB3L1     PNMA1
  1 STT1220_LMS     NA        4.535153  2.4097580   12.723243   7.664104    1.6585204
  2 STT1220_LMS     2         NA              NA         NA        NA          NA
  3 STT516_LMS      NA        5.946659  2.9526839   12.521319   3.584952    4.0230725
  4 STT516_LMS      2         NA              NA         NA        NA          NA
  5 STT5528_LMS     1         3.361717  2.2189714   11.534295   6.691495    1.8199992
  6 STT5839_LMS     NA      5.148852    2.5861328   11.561949   10.344959   3.7855736
  7 STT5839_LMS     1          NA          NA          NA          NA         NA

I apprecite it if anybody shares his/her comment with me.

Mohammad
  • 103
  • 6
  • I don't understand what "all row names are equally one by one but not in the same order" means. The data you posted doesn't include any row names that are in both datasets, so we can't actually see what your merge would look like – camille Dec 12 '19 at 20:32
  • @camille probably STT5840_LMS is on the first row for clin_dat but not for datExpr. Appearantly merge, as specified here, isn't able to deal with it (although that surprises me a bit) – Annet Dec 12 '19 at 21:20
  • Dear@camille, as I wrote datExpr and clin_dat have 50 rows and theses are samples of them."all row names are equally one by one but not in the same order" means that all rows in 'datExpr' also are in row names of 'cli_dat' but not as the same order. – Mohammad Dec 12 '19 at 21:26
  • Can you post samples that will recreate the issue though, like where there are IDs shared between the two datasets? – camille Dec 12 '19 at 21:43
  • Dear @camille, I can send two data frames for you. but let me know how? – Mohammad Dec 12 '19 at 21:48
  • Take a look at the guidance on making [reproducible examples](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – camille Dec 12 '19 at 21:49

1 Answers1

0

Elsewhere on stackoverflow you can find some ideas on how to solve it.

Another solution might be to put the rownames in a column using rownames_to_column from the tibble package. Using the dplyr package you can create some code like this:

library(tibble)
library(dplyr)
 datExpr_batch <- datExpr %>% 
                 rownames_to_column() %>%
                 full_join(clin_dat %>% rownames_to_column(), 
                           by = "rowname")
Annet
  • 846
  • 3
  • 14
  • Dear @Annet, I would like to join 'datExpr' and 'clin_dat'. So, datExpr_batch is my result of joining two that data frames. could you kindly revise your code? – Mohammad Dec 12 '19 at 21:15
  • I revised it for you – Annet Dec 12 '19 at 21:17
  • Thanks@Annet. I ran your revised code but I get this error: Error in rownames_to_column(.) : could not find function "rownames_to_column" – Mohammad Dec 12 '19 at 21:24
  • Did you load the tibble package? https://tibble.tidyverse.org/reference/rownames.html – Annet Dec 12 '19 at 21:26
  • Dear @Annet, Now by calling tibble package, your revised code does work but again the 'datExpr_batch' has 88 rows. 1 to 50 of the datExpr rows are complete in this join but the column of batch_STF is not complete. the values in batch_STF for some rows are complete and for some of them no. but in 33 additional rows, values of genes are empty but values of batch_STF are complete. – Mohammad Dec 12 '19 at 21:40
  • So your rownames are not equal/similar. The problem is not in the order of the rownames (which would have surprised me if it had been). As you now have the rownames in an easy mutatable column you can adjust them so that they become equal. It is hard to figure out the exact difference with the limited info provided here. – Annet Dec 12 '19 at 21:54
  • Dear@Annet, suppose I add a column in each data frame based on their row names and I call them "Cases_barcode"(datExpr$Cases_barcode<- rownames and also for clin_dat too). Now, I have one added column in datExpr and clin_dat. So, how can I adjust them by these 2 new columns? – Mohammad Dec 12 '19 at 22:16
  • Dear @Annet, I also check all rows of datExpr in clin_dat by searching in MS Excel. all rownames in datExpr are in clin_dat. So, I appreciate if you guide me how to check and adjust them in R – Mohammad Dec 12 '19 at 23:26
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/204159/discussion-between-mohammad-and-annet). – Mohammad Dec 12 '19 at 23:27