I'm not sure I have the atomic weights right, but I guess that can be fixed.
The tricky bit for me was trying to figure out cases like CH3, where the simple (letters)(numbers) regex doesn't work.
re.findall does the heavy lifting here. There may be a better way to parse the C2H4 string, and I'd be interested in it, but this works. Obviously you can clean things up and make neater functions, etc.
But the regex here, which I suspect you are most interested in, says: look for a string of letters, upper or lower case, then a string of numbers. That is passed to calc_weight, which separates the string into letters and numbers. The letters are sent to an atomic weight, if available. If not, an error is thrown. Then the weight is multiplied by the number.
import re
import sys
weight = { 'cl': 30, 'n': 8, 'o': 12, 'c': 6, 'h': 2 }
def calc_weight(my_str):
elt = my_str[1].lower()
if not re.search("[0-9]", my_str[0]): amt = 1
else: amt = re.sub("^[a-zA-Z]+", "", my_str[0])
if elt not in weight: sys.exit(elt + " is not a valid element.")
return int(amt) * weight[elt]
my_string = "C2H4"
a = re.findall("((Cl|H|O|C|N)[0-9]*)", my_string)
my_weight = 0
for b in a:
my_weight += calc_weight(b)
print("Weight of", my_string, "is", my_weight)
A word on the code: my_str[0] and my_str[1] are part of a tuple from findall, because I have two pairs of parentheses. The first is the overall string, and the second is the element.
Hope this helps. Note you can probably improve on the code: throw a better error message for a bad string, etc. But I wanted to at least allow for capitalization e.g. if someone typed Mg or MG it should not make a difference.