3

I have a vector mynumbers with several strings of numbers, say:

mynumbers <- c("122212", "134134", "134134", "142123", "212141", "213243", "213422", "214231", "221233")

My goal is to translate such strings into strings of letters following these relationships:

1=A
2=C
3=G
4=T

I'd like to encapsulate this in a function so that:

myletters <- translate_function(mynumbers)

myletters would thus be:

myletters <- c("ACCCAC", "AGTAGT", "AGTAGT", "ATCACG", "CACATA", "CAGCTG", "CAGTCC", "CATCGA", "CCACGG")

I'm thinking of a function like this, obviously not correct... I start to get confused when dealing with strsplit and lists...

translate_function <- function(numbers){
  map_df <- data.frame(num=1:4, nuc=c('A','C','G','T'))
  #strsplit numbers
  split_numbers <- strsplit(numbers, '')
  letters <- paste(sapply(split_numbers, function(x) map_df$nuc[which(map_df$num==x)]), collapse='')
  
  return(letters)
}

What would be the easiest and most elegant way to accomplish this? Thanks!

DaniCee
  • 2,397
  • 6
  • 36
  • 59

4 Answers4

4

Easily by chartr,

chartr("1234" , "ACGT", mynumbers)
[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA"
[9] "CCACGG"
Park
  • 14,771
  • 6
  • 10
  • 29
4

You may use stringr::str_replace_all create a named vector from map_df to replace.

map_df <- data.frame(num=1:4, nuc=c('A','C','G','T'))
stringr::str_replace_all(mynumbers, setNames(map_df$nuc, map_df$num))

#[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA" "CCACGG"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
3

Use it in a function this way:

translate_function <- function(numbers){
  map_df <- data.frame(num=1:4, nuc=c('A','C','G','T'))
  letters <- chartr(paste(map_df$num, collapse=''), paste(map_df$nuc, collapse=''), numbers)
  return(letters)
}
translate_function(mynumbers)

Output:

[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA"
[9] "CCACGG"

But it's better without a dataframe:

translate_function <- function(numbers){
  letters <- chartr("1234", "ACGT", numbers)
  return(letters)
}
translate_function(mynumbers)

Output:

[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA"
[9] "CCACGG"
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
1

Using gsubfn

library(gsubfn)
gsubfn("(\\d)", setNames(as.list(c("A", "C", "G", "T")), 1:4), mynumbers)
[1] "ACCCAC" "AGTAGT" "AGTAGT" "ATCACG" "CACATA" "CAGCTG" "CAGTCC" "CATCGA" "CCACGG"
akrun
  • 874,273
  • 37
  • 540
  • 662