-1

I have a codon usage table (http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=10029&aa=1&style=GCG). I would like to generate a vector of most used codons (1 for each amino acid residue). There are 20 naturally occurring AmAcids + stop codon (End), so my vector length will be 21. I've tryed using grep, but it takes only one pattern at a time, or searches for all patterns which doesn't help. Is there a way of doing this avoiding a loop?

biomiha
  • 1,358
  • 2
  • 12
  • 25
  • 1
    What does your input data look like? Do you already know the correct open reading frame? Please create a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that clearly shows some sample input and desired output. – MrFlick Jul 03 '14 at 06:55
  • The input is the webpage I added. I could essentially generate a text file with the data but the webpage would be even better because switching to a different codon usage table would mean just pasting a different web address into the parser. – biomiha Jul 03 '14 at 12:15

1 Answers1

0

Here's what I think you would like to do. You can use the package XML to read the data and then dplyr to calculate the maximum.

# load packages
require(XML)
require(dplyr)
# read the table
tt <- htmlParse('http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=10029&aa=1&style=GCG')
df <- read.table(text=xpathSApply(tt, "//pre", xmlValue), 
                 header=TRUE, 
                 fill=TRUE)
# calculate the maximum codons by Amino Acid
df.max <- group_by(df, AmAcid) %.% 
  filter(Number==max(Number)) %.% 
  select(AmAcid, Codon)

The result is then a data.frame with 21 rows. You can access the column Codon if you want to get a vector.

shadow
  • 21,823
  • 4
  • 63
  • 77