I am trying to parse a chemical formula (in the format, for example: Al2O3
or O3
or C
or C11H22O12
) in C# from a string. It works fine unless there is only one atom of a particular element (e.g. the oxygen atom in H2O
). How can I fix that problem, and in addition, is there a better way to parse a chemical formula string than I am doing?
ChemicalElement is a class representing a chemical element. It has properties AtomicNumber (int), Name (string), Symbol (string). ChemicalFormulaComponent is a class representing a chemical element and atom count (e.g. part of a formula). It has properties Element (ChemicalElement), AtomCount (int).
The rest should be clear enough to understand (I hope) but please let me know with a comment if I can clarify anything, before you answer.
Here is my current code:
/// <summary>
/// Parses a chemical formula from a string.
/// </summary>
/// <param name="chemicalFormula">The string to parse.</param>
/// <exception cref="FormatException">The chemical formula was in an invalid format.</exception>
public static Collection<ChemicalFormulaComponent> FormulaFromString(string chemicalFormula)
{
Collection<ChemicalFormulaComponent> formula = new Collection<ChemicalFormulaComponent>();
string nameBuffer = string.Empty;
int countBuffer = 0;
for (int i = 0; i < chemicalFormula.Length; i++)
{
char c = chemicalFormula[i];
if (!char.IsLetterOrDigit(c) || !char.IsUpper(chemicalFormula, 0))
{
throw new FormatException("Input string was in an incorrect format.");
}
else if (char.IsUpper(c))
{
// Add the chemical element and its atom count
if (countBuffer > 0)
{
formula.Add(new ChemicalFormulaComponent(ChemicalElement.ElementFromSymbol(nameBuffer), countBuffer));
// Reset
nameBuffer = string.Empty;
countBuffer = 0;
}
nameBuffer += c;
}
else if (char.IsLower(c))
{
nameBuffer += c;
}
else if (char.IsDigit(c))
{
if (countBuffer == 0)
{
countBuffer = c - '0';
}
else
{
countBuffer = (countBuffer * 10) + (c - '0');
}
}
}
return formula;
}