R: translate strings of numbers into strings of letters following a relationships table

Question

I have a vector mynumbers with several strings of numbers, say:

mynumbers <- c("122212", "134134", "134134", "142123", "212141", "213243", "213422", "214231", "221233")

My goal is to translate such strings into strings of letters following these relationships:

1=A
2=C
3=G
4=T

I'd like to encapsulate this in a function so that:

myletters <- translate_function(mynumbers)

myletters would thus be:

myletters <- c("ACCCAC", "AGTAGT", "AGTAGT", "ATCACG", "CACATA", "CAGCTG", "CAGTCC", "CATCGA", "CCACGG")

I'm thinking of a function like this, obviously not correct... I start to get confused when dealing with strsplit and lists...

translate_function <- function(numbers){
  map_df <- data.frame(num=1:4, nuc=c('A','C','G','T'))
  #strsplit numbers
  split_numbers <- strsplit(numbers, '')
  letters <- paste(sapply(split_numbers, function(x) map_df$nuc[which(map_df$num==x)]), collapse='')
  
  return(letters)
}

What would be the easiest and most elegant way to accomplish this? Thanks!

score 4 · Accepted Answer · answered Sep 29 '21 at 05:22

4

Easily by chartr,

chartr("1234" , "ACGT", mynumbers)
[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA"
[9] "CCACGG"

answered Sep 29 '21 at 05:22

Park

14,771
6
10
29

oh wow I wasn't aware of `chartr`, thanks! – DaniCee Sep 29 '21 at 06:59

score 4 · Answer 2 · answered Sep 29 '21 at 05:46

You may use stringr::str_replace_all create a named vector from map_df to replace.

map_df <- data.frame(num=1:4, nuc=c('A','C','G','T'))
stringr::str_replace_all(mynumbers, setNames(map_df$nuc, map_df$num))

#[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA" "CCACGG"

score 3 · Answer 3 · answered Sep 29 '21 at 05:25

Use it in a function this way:

translate_function <- function(numbers){
  map_df <- data.frame(num=1:4, nuc=c('A','C','G','T'))
  letters <- chartr(paste(map_df$num, collapse=''), paste(map_df$nuc, collapse=''), numbers)
  return(letters)
}
translate_function(mynumbers)

Output:

[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA"
[9] "CCACGG"

But it's better without a dataframe:

translate_function <- function(numbers){
  letters <- chartr("1234", "ACGT", numbers)
  return(letters)
}
translate_function(mynumbers)

Output:

[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA"
[9] "CCACGG"

score 1 · Answer 4 · answered Sep 29 '21 at 17:04

1

Using gsubfn

library(gsubfn)
gsubfn("(\\d)", setNames(as.list(c("A", "C", "G", "T")), 1:4), mynumbers)
[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA" "CCACGG"

answered Sep 29 '21 at 17:04

akrun

874,273
37
540
662

R: translate strings of numbers into strings of letters following a relationships table

4 Answers4