0

I would like to replace the SYMBOLS in a data frame DATA with the corresponding GENE_ID from data frame NAMES.

NAMES

gene_id              symbols
ENSMUSG00000000001   Gnai3
ENSMUSG00000000003   Pbsn
ENSMUSG00000000028   Cdc45


DATA

symbols  sample1 sample2 sample3
Gnai3    1        3       3
Pbsn     1        3       3
Cdc45    3        3       3


giegie
  • 463
  • 4
  • 11
  • Does this answer your question? [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – camille Nov 21 '19 at 17:07

2 Answers2

4

We can do this with a simple inner_join:

library(dplyr)

DATA %>%
  inner_join(NAMES, by = "symbols") %>%
  select(symbols = gene_id, sample1:sample3)

or with Base R:

output <- merge(NAMES, DATA, by = "symbols")[,-1]
names(output)[1] <- 'symbols'

Output:

             symbols sample1 sample2 sample3
1 ENSMUSG00000000001       1       3       3
2 ENSMUSG00000000003       1       3       3
3 ENSMUSG00000000028       3       3       3

Data:

DATA <- structure(list(symbols = c("Gnai3", "Pbsn", "Cdc45"), sample1 = c(1L, 
1L, 3L), sample2 = c(3L, 3L, 3L), sample3 = c(3L, 3L, 3L)), class = "data.frame", row.names = c(NA, 
-3L))

NAMES <- structure(list(gene_id = c("ENSMUSG00000000001", "ENSMUSG00000000003", 
"ENSMUSG00000000028"), symbols = c("Gnai3", "Pbsn", "Cdc45")), class = "data.frame", row.names = c(NA, 
-3L))
acylam
  • 18,231
  • 5
  • 36
  • 45
0

In base you can use match to find the corresponding GENE_ID from data frame NAMES:

DATA$symbols  <- NAMES$gene_id[match(DATA$symbols, NAMES$symbols)]
DATA
#             symbols sample1 sample2 sample3
#1 ENSMUSG00000000001       1       3       3
#2 ENSMUSG00000000003       1       3       3
#3 ENSMUSG00000000028       3       3       3

or you subset by name:

DATA$symbols <- setNames(NAMES$gene_id, NAMES$symbols)[DATA$symbols]
DATA
#             symbols sample1 sample2 sample3
#1 ENSMUSG00000000001       1       3       3
#2 ENSMUSG00000000003       1       3       3
#3 ENSMUSG00000000028       3       3       3
GKi
  • 37,245
  • 2
  • 26
  • 48