0

I have two dataframes Full Table and Dataset, the first one has Key_Identifier Accesion and Sequence columns while the other has only my data of interest with Accesion and Sequence columns.

What I Have

in Full Table:

Key_Identifier Accesion Sequence
ARG1 AA001 AAACTGGG
ARG2 AA002 AAACTGGC
ARS2 AA003 AAACTGGG
ART1 AA004 AAACTGGA
ARG3 AA005 AAACTGCG
ART4 AA006 AAACTGGG

and in Dataset:

Accesion Sequence
AA001 ACTGGG
AA003 ACTGGG
AA005 ACTGCG
AA006 ACTGGG

What I Want

I want to match Accesion from Dataset in Full Table and replace its name to Key_Identifer from Full Table keeping Sequence intact in Dataset. Like:

Accesion Sequence
ARG1 ACTGGG
ARS2 ACTGGG
ARG3 ACTGCG
ART4 ACTGGG

which is the best way to do this?

Community
  • 1
  • 1
Le Paul
  • 67
  • 8

2 Answers2

0

You can use merge to merge two data frames. Since column names are different this can be handled by by.x = 'key_identifier', by.y = 'accesion'.

a <- data.frame('key_identifier' = c('AAA001', 'AAA003', 'AAA005'),  'accesion'= c('ARG1', 'ARG2', 'ARS2'), 'sequence' = c('AAACTGGG', 'AAACTGGC', 'AAACTGGG'))
b <- data.frame('accesion' = c('AAA001', 'AAA003', 'AAA005'), 'sequence' = c('ACTGGG', 'ACTGGG', 'ACTGCG'))
c <- merge(a, b, by.x = 'key_identifier', by.y = 'accesion')

 key_identifier accesion sequence.x sequence.y
1         AAA001     ARG1   AAACTGGG     ACTGGG
2         AAA003     ARG2   AAACTGGC     ACTGGG
3         AAA005     ARS2   AAACTGGG     ACTGCG

Select necessary columns,

f <- c[,c(2,4)]
colnames(f) <- c('Accesion', 'Sequence')

Accesion Sequence
1     ARG1   ACTGGG
2     ARG2   ACTGGG
3     ARS2   ACTGCG
Casper
  • 1,469
  • 10
  • 19
0

The dplyr/the tidyverse works well for this since it's pretty intuitive for new R users. You can get rid of columns you don't care about with select(-col_to_delete), then inner_join to get just the rows where there's a match. From there it's a simple case of removing the extra column and renaming as needed:

Full_Table %>% 
  select(-Sequence) %>% 
  inner_join(
    Dataset,
    by = "Accesion"
  ) %>% 
  select(-Accesion) %>% 
  rename(
    Accesion = Key_Identifier
  )

For your simple example, I imagine merge from base R is sufficient, but if you are doing more complex transformations, definitely look into dplyr.

GenesRus
  • 1,057
  • 6
  • 16