say I have a molecular formula like "C5Cl2NO2S" for which I'd like to calculate in R the molecular mass. I though the easiest would be to use a regular expression, to analyze and split the formula into it's elemental components and hand those over to a separate function which performs the calculation. However, I'm facing the problem that, when I hand over the backreferences of my RegEx, these are not evaluated but handed over as "\\1", "\\2".
This is my attempt:
masses <- list(
C = 12,
H = 1.01,
Cl = 34.97,
N = 14.00,
O = 15.99,
P = 30.97,
S = 31.97
)
elementMass <- function( element, count ) {
if( count == "" ) {
count <- "1"
}
return( as.character( masses[[ element ]] * as.numeric( count ) ) )
}
sumFormula2Mass <- function( x ){
y <- 0.0
for( e in x ) {
if( e != "" ) {
y <- y + as.numeric( sub( "^(C|H|Cl|N|O|P|S)([0-9]*)$", elementMass("\\1", "\\2"), e ) )
}
}
return( y )
}
sub(
"^(C[0-9]*)?(H[0-9]*)?(Cl[0-9]*)?(N[0-9]*)?(O[0-9]*)?(P[0-9]*)?(S[0-9]*)?$",
sumFormula2Mass( c("\\1", "\\2", "\\3", "\\4", "\\5", "\\6", "\\7") ),
"C5Cl2NO2S"
)
Any ideas how to improve this? Many thanks