I am working with some genetic data and one of my columns isn't in the format I want it to be. I don't know how much biology is talked about on here, but I am trying to fix how my amino acids are shown in my data.
Amino acids obviously have a name but they also have a 3 letter abbreviation and a 1 letter abbreviation. My data has the amino acids in the 3 letter form but I want to change them to the 1 letter abbreviation. Here is an example of my data.
chr location effect impact AA_change
1 12543 missense_variant MODERATE p.Ala12Val
1 52367 missense_variant MODERATE p.Leu54Pro
1 752347 missense_variant MODERATE p.Met99Ser
1 984645 missense_variant MODERATE p.Lys34Ile
1 989845 missense_variant MODERATE p.Arg4Cys
1 999854 missense_variant MODERATE p.His43Gly
1 999855 missense_variant MODERATE p.Glu14Phe
dat <- structure(list(chr = c(1L, 1L, 1L, 1L, 1L, 1L, 1L), location = c(12543L,
52367L, 752347L, 984645L, 989845L, 999854L, 999855L), effect = c("missense_variant",
"missense_variant", "missense_variant", "missense_variant", "missense_variant",
"missense_variant", "missense_variant"), impact = c("MODERATE",
"MODERATE", "MODERATE", "MODERATE", "MODERATE", "MODERATE", "MODERATE"
), AA_change = c("Ala12Val", "Leu54Pro", "Met99Ser", "Lys34Ile",
"Arg4Cys", "His43Gly", "Glu14Phe")), .Names = c("chr", "location",
"effect", "impact", "AA_change"), row.names = c(NA, -7L), class = "data.frame")
Here is a list of the 3 letter amino acids and what their one better abbreviation is.
Ala == A
Arg == R
Asn == N
Asp == D
Cys == C
Glu == E
Gln == Q
Gly == G
His == H
Ile == I
Leu == L
Lys == K
Met == M
Phe == F
Pro == P
Ser == S
Thr == T
Trp == W
Tyr == Y
Val == V
I feel like there is a simple function that can be made to do this but I am struggling to thing of how to do this. I am use to changing just one part of a column not two things at once. So what I am asking is how can I change this
Ala12Val
Leu54Pro
Met99Ser
Lys34Ile
Arg4Cys
His43Gly
Glu14Phe
To this
A12V
L54P
M99S
K32I
R4C
E14F
Is this something that can be done?