2

I have found the below code. It is working nicely but a bit more error prone when you have the full alphabet involved.

ID = c(1,2,3)
POS1 = c('AG','GC','TT')
POS2 = c('GT','CC','TC')
POS3 = c('GG','CT','AT')
DF = data.frame(ID,POS1,POS2,POS3)
DF$POS1X <- chartr('ACGT','1234',DF$POS1)

but looking for something that won't require typing all letters and numbers into the code? Let's use the same data frame and what I am after is a loop that will covert "a" into 1, "b" into 2 etc...

Update: I have tried below in order not to create another column and just modify existing POS1. I did not work thou.

ID = c(1,2,3)
POS1 = c('AG','GC','TT')
POS2 = c('GT','CC','TC')
POS3 = c('GG','CT','AT')
DF = data.frame(ID,POS1,POS2,POS3)

just changing factor to character for POS1

DF$POS1  <- as.character(as.factor(DF$POS1))

map<-data.frame(LETTERS,as.character(1:26))
names(map)<-c("letters","numbers")

let2nums <- function(string){
  splitme <- unlist(strsplit(string,""))
  returnme <- as.integer(map[map$letters %in% splitme,]$numbers)
  return(as.numeric(returnme))
}

DF$POS1 <- mapply(let2nums, DF$POS1)

The oucome is rather interesing :) any idea why?

Kalenji
  • 401
  • 2
  • 19
  • 42
  • 2
    You probably would have had fewer possible sources of error had you made these character columns rather than leaving them as factor. – IRTFM May 22 '17 at 15:21
  • This is a [related post](https://stackoverflow.com/questions/37239715/convert-letters-to-numbers/37239786). – lmo May 22 '17 at 16:10

3 Answers3

4

One option is to create a key/value pair and then with gsubfn replace the values

library(gsubfn)
v1 <- setNames(seq_along(LETTERS), LETTERS)
DF[-1] <- lapply(DF[-1], function(x) gsubfn('(.)', as.list(v1), as.character(x)))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Works like magic, just one simple question, why do only capitol letters are changed and not small letters? – Kalenji May 23 '17 at 12:16
  • 1
    @Kalenji You can just it to `gsubfn('(.)', as.list(v1), toupper(as.character(x))))` – akrun May 23 '17 at 12:42
  • 2
    If you used c(LETTERS,letters) instead of just LETTERS you would get both upper and lowercase letters converted. – IRTFM May 23 '17 at 22:20
2

If you're really looking to process it through a loop as you said, you can do something like this.

for(i in 1:nrow(DF))
{
  DF$POS1X[i] <- paste(match(strsplit(toupper(DF$POS1[i]), "")[[1]], LETTERS), collapse = "")
}

You could alternatively apply this as a function using mapply as below.

letter.to.number <- function(x)
{
  num <- paste(match(strsplit(toupper(x), "")[[1]],LETTERS), collapse = "")
  return(num)
}

DF$POS1X <- mapply(letter.to.number, DF$POS1)
Matt Jewett
  • 3,249
  • 1
  • 14
  • 21
  • The chartr function has already been offered by the questioner, and is both more expressive and more efficient. – IRTFM May 22 '17 at 15:39
  • The OP actually requested an alternative to chartr as it requires the hard coding of the translation, and also breaks the code as chartr drops the second character in a translation. IE attempting `chartr('ABCDEFGHIJKL','123456789101112', "LKJ")` returns back `101` not `121110` as the OP would have liked. The code I have provide gives the requested functionality and is a viable solution to the problem. – Matt Jewett May 22 '17 at 15:50
  • I'm guessing that the OP did not want such a monstrosity... such a result has no separators and is completely ambiguous. – IRTFM May 22 '17 at 23:42
  • I agree that having the final output in a more structured format would certainly yield many benefits, either through the use of separators or simply by having single digit numbers preceded by zero to ensure every character is always represented by two digits. However that was not part of the original request, and without knowing the intended use of the final output it may not be necessary to do so. I would encourage you to make those suggestions to the OP, they may find your feedback useful. Alternatively if you have a better solution to offer, I'm sure you know where the answer button is. – Matt Jewett May 23 '17 at 04:16
  • Thankks all for your help. In terms of the output I do not mind if the numbers are without spaces. Anyway, for some reason the code provided by Matt Jewett introduces just NAs. How about having values converted to numbers in the same column, I mean POS1 and not POS1X? – Kalenji May 23 '17 at 11:11
  • If there are any characters that are non alphabetical, such as spaces or numbers those would be converted into NAs, as there would be no match within the LETTERS dataframe for those. – Matt Jewett May 23 '17 at 12:22
  • An easy way to remove those would be to wrap the match() function inside a na.omit statement like this. `num <- paste(na.omit(match(strsplit(toupper(x), "")[[1]],LETTERS)), collapse = "")` – Matt Jewett May 23 '17 at 12:34
  • Also, to convert to numbers for the same column, the only thing you would need to change is to reference the column on the left hand side of the assignment arrow. IE: Change `DF$POS1X` to `DF$POS1` and it will overwrite the column with the numerical version. – Matt Jewett May 23 '17 at 12:49
1

You can create a map :

map<-data.frame(LETTERS,as.character(1:26))
names(map)<-c("letters","numbers")

Then a function:

 let2nums <- function(string){
    splitme <- unlist(strsplit(string,""))
    returnme <- as.character(map[map$letters %in% splitme,]$numbers)
    return(as.numeric(returnme))
 }

> let2nums("ACGT")
[1] "13720"
amonk
  • 1,769
  • 2
  • 18
  • 27