This question is asking for a simple parser for a simple type of equation. I am assuming that you do not need to support all kinds of irregular equations with parentheses and weird symbols.
Just to be safe, I would use a lot of String.split()
instead of regexes.
A (relatively) simple solution would do the following:
- Split on
->
- Make sure there are two pieces
- Sum up each piece:
- Split on
+
- Parse each molecule and sum up the atoms:
- Parse optional multiplier
- Find all matches to molecule regex
- Convert the numbers and add them up by element
- Compare the results
Each level of parsing can be handily done in a separate method. Using regex is probably the best way to parse the individual molecules, so I borrowed the expression from here: https://codereview.stackexchange.com/questions/2345/simplify-splitting-a-string-into-alpha-and-numeric-parts. The regex is pretty much trivial, so please bear with me:
import java.util.Map;
import java.util.HashMap;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class SimpleChemicalEquationParser
{
// Counts of elements on each side
private Map<String, Integer> left;
private Map<String, Integer> right;
public SimpleChemicalEquationParser(String eqn)
{
this.left = new HashMap<>();
this.right = new HashMap<>();
parse(eqn);
}
public boolean isBalanced()
{
return left.equals(right);
}
public boolean isSimpleBalanced()
{
return leftCount() == rightCount();
}
public int leftCount()
{
return left.values().stream().mapToInt(Integer::intValue).sum();
}
public int rightCount()
{
return right.values().stream().mapToInt(Integer::intValue).sum();
}
private void parse(String eqn)
{
String[] sides = eqn.split("->");
if(sides.length != 2) {
throw new RuntimeException("Check your equation. There should be exactly one -> symbol somewhere");
}
parseSide(sides[0], this.left);
parseSide(sides[1], this.right);
}
private void parseSide(String side, Map<String, Integer> counter)
{
String[] molecules = side.split("\\+");
for(String molecule : molecules) {
parseMolecule(molecule, counter);
}
}
private void parseMolecule(String molecule, Map<String, Integer> counter)
{
molecule = molecule.trim();
Matcher matcher = Pattern.compile("([a-zA-Z]+)\\s*([0-9]*)").matcher(molecule);
int multiplier = 1;
int endIndex = 0;
while(matcher.find()) {
String separator = molecule.substring(endIndex, matcher.start()).trim();
if(!separator.isEmpty()) {
// Check if there is a premultiplier before the first element
if(endIndex == 0) {
String multiplierString = molecule.substring(0, matcher.start()).trim();
try {
multiplier = Integer.parseInt(multiplierString);
} catch(NumberFormatException nfe) {
throw new RuntimeException("Invalid prefix \"" + multiplierString +
"\" to molecule \"" + molecule.substring(matcher.start()) + "\"");
}
} else {
throw new RuntimeException("Nonsensical characters \"" + separator +
"\" in molecule \"" + molecule + "\"");
}
}
parseElement(multiplier, matcher.group(1), matcher.group(2), counter);
endIndex = matcher.end();
}
if(endIndex != molecule.length()) {
throw new RuntimeException("Invalid end to side: \"" + molecule.substring(endIndex) + "\"");
}
}
private void parseElement(int multiplier, String element, String atoms, Map<String, Integer> counter)
{
if(!atoms.isEmpty())
multiplier *= Integer.parseInt(atoms);
if(counter.containsKey(element))
multiplier += counter.get(element);
counter.put(element, multiplier);
}
public static void main(String[] args)
{
// Collect all command line arguments into one equation
StringBuilder sb = new StringBuilder();
for(String arg : args)
sb.append(arg).append(' ');
String eqn = sb.toString();
SimpleChemicalEquationParser parser = new SimpleChemicalEquationParser(eqn);
boolean simpleBalanced = parser.isSimpleBalanced();
boolean balanced = parser.isBalanced();
System.out.println("Left: " + parser.leftCount());
for(Map.Entry<String, Integer> entry : parser.left.entrySet()) {
System.out.println(" " + entry.getKey() + ": " + entry.getValue());
}
System.out.println();
System.out.println("Right: " + parser.rightCount());
for(Map.Entry<String, Integer> entry : parser.right.entrySet()) {
System.out.println(" " + entry.getKey() + ": " + entry.getValue());
}
System.out.println();
System.out.println("Atom counts match: " + simpleBalanced);
System.out.println("Elements match: " + balanced);
}
}
All the work is done by the parse
method and it's subordinates, which make a sort of virtual call tree. Since this approach makes it especially easy to make sure that the atoms of each element are actually balanced out, I have gone ahead and done that here. This class prints the counts of the atoms on each side of the equation, whether or not the raw counts balance out, as well as whether or not they match my element type. Here are a couple of example runs:
OP's original example:
$ java -cp . SimpleChemicalEquationParser '12 C O2 + 6 H2O -> 2 C6H12O6 + 12 O2'
Left: 54
C: 12
H: 12
O: 30
Right: 72
C: 12
H: 24
O: 36
Atom counts match: false
Elements match: false
Added Ozone to make the number of atoms match up
$ java -cp . SimpleChemicalEquationParser '12 C O2 + 6 H2O + 6 O3 -> 2 C6H12O6 + 12 O2'
Left: 72
C: 12
H: 12
O: 48
Right: 72
C: 12
H: 24
O: 36
Atom counts match: true
Elements match: false
Added water to make everything match up
$ java -cp . SimpleChemicalEquationParser '12 C O2 + 12 H2O -> 2 C6H12O6 + 12 O2'
Left: 72
C: 12
H: 24
O: 36
Right: 72
C: 12
H: 24
O: 36
Atom counts match: true
Elements match: true
Notice that I added a space between C
and O
in CO2
. This is because my current regex for molecules, ([a-zA-Z]+)\\s*([0-9]*)
, allows any combination of letters to represent an element. If your elements are always going to be simple one-letter elements, change this to ([a-zA-Z])\\s*([0-9]*)
(remove the +
quantifier). If they are going to be properly named, two letter combinations with the second letter always lowercase, do this instead: ([A-Z][a-z]?)\\s*([0-9]*)
. I recommend the latter option. For both modified versions, the space in C O2
will no longer be necessary.