I have a codon usage table (http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=10029&aa=1&style=GCG). I would like to generate a vector of most used codons (1 for each amino acid residue). There are 20 naturally occurring AmAcids + stop codon (End), so my vector length will be 21. I've tryed using grep, but it takes only one pattern at a time, or searches for all patterns which doesn't help. Is there a way of doing this avoiding a loop?
Asked
Active
Viewed 241 times
-1
-
1What does your input data look like? Do you already know the correct open reading frame? Please create a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that clearly shows some sample input and desired output. – MrFlick Jul 03 '14 at 06:55
-
The input is the webpage I added. I could essentially generate a text file with the data but the webpage would be even better because switching to a different codon usage table would mean just pasting a different web address into the parser. – biomiha Jul 03 '14 at 12:15
1 Answers
0
Here's what I think you would like to do. You can use the package XML
to read the data and then dplyr
to calculate the maximum.
# load packages
require(XML)
require(dplyr)
# read the table
tt <- htmlParse('http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=10029&aa=1&style=GCG')
df <- read.table(text=xpathSApply(tt, "//pre", xmlValue),
header=TRUE,
fill=TRUE)
# calculate the maximum codons by Amino Acid
df.max <- group_by(df, AmAcid) %.%
filter(Number==max(Number)) %.%
select(AmAcid, Codon)
The result is then a data.frame
with 21 rows. You can access the column Codon if you want to get a vector.

shadow
- 21,823
- 4
- 63
- 77