Parsing a string that represents a chemical reaction and verify if the reaction is possible

Question

I have to write a program that takes a user's chemical equation as an input, like 12 CO2 + 6 H2O -> 2 C6H12O6 + 12 O2, and watch if the amount of Atoms is on both sites the same. Is there any way to calculate and parse this easily?

For example:

12 CO2 + 6 H2O -> 2 C6H12O6 + 12 O2

12*2+6*2 -> 2*6+2*12+2*6+12*2

In this case there should be the Output "false".

This is my code but it's actually is only to try out something:

public static void main(String[] args) {
    Scanner s = new Scanner(System.in);
    List<String> list = new ArrayList<String>();
    String input = "";
    while (!(input.equals("end"))) {
        input = s.nextLine();
        list.add(input);
    }
    list.remove(list.size() - 1);
    for (int i = 0; i < list.size(); i++) {
        int before = 0;
        int after = 0;
        String string = list.get(i);
        string = besserUmwandeln(string);
        System.out.println(string);
    }
}

public static String besserUmwandeln(String string) {
    string = string.replace("-", "");
    string = string.trim().replaceAll(("\\s+"), " ");
    string = string.replace(' ', '*');
    StringBuilder builder = new StringBuilder(string);
    System.out.println(string);
    for (int k = 0; k < builder.length(); k++) {
        if (Character.isUpperCase(builder.charAt(k))) {
            builder.setCharAt(k, ':');
        }
        if (Character.isLowerCase(builder.charAt(k))) {
            builder.setCharAt(k, '.');
        }
        if (Character.isDigit(builder.charAt(k))) {
        } else {
        }
    }
    for (int j = 0; j < builder.length(); j++) {
        if (j < builder.length() && builder.charAt(j) == ':' && builder.charAt(j + 1) == '.') {
            builder.deleteCharAt(j + 1);
        }
    }
    for (int i = 0; i < builder.length(); i++) {
        if (i < builder.length() - 1 && builder.charAt(i) == ':' && builder.charAt(i + 1) == ':') {
            builder.deleteCharAt(i);
        }
    }
    for (int i = 0; i < builder.length(); i++) {
        if (i < builder.length() - 1 && builder.charAt(i) == '+' && builder.charAt(i + 1) == '*') {
            builder.deleteCharAt(i + 1);
        }
    }
    for (int i = 0; i < builder.length(); i++) {
        if (i < builder.length() - 1 && builder.charAt(i) == '*' && builder.charAt(i + 1) == '+') {
            builder.deleteCharAt(i);
        }
    }
    for (int i = 0; i < builder.length(); i++) {
        if (i < builder.length() - 1 && builder.charAt(i) == '*' && builder.charAt(i + 1) == '>') {
            builder.deleteCharAt(i);
        }
    }
    for (int i = 0; i < builder.length(); i++) {
        if (i < builder.length() - 1 && builder.charAt(i) == '>' && builder.charAt(i + 1) == '*') {
            builder.deleteCharAt(i + 1);
        }
    }
    for (int i = 0; i < builder.length(); i++) {
        if (i < builder.length() - 1 && builder.charAt(i) == '*' && builder.charAt(i + 1) == ':') {
            builder.deleteCharAt(i + 1);
        }
    }


    return builder.toString();
}

One approach is to define an EBNF grammar, and build a parsing tool. You could do this **beautifully** in C++: you could even *overload* the pointer to member operator `->`. See Boost Spirit. IMHO Java is not for serious scientific programming. — Bathsheba, Dec 13 '16 at 19:49
@Bathsheba Thank you for your answer. I know that Java is not supposed for serious scientifitc programms but in this case I need to develop it in Java because I have no other choice. — , Dec 13 '16 at 19:57
Hogwash! Build it in C++ and provide a JNI. See `native` keyword in Java. I'll avoid the photosynthesis puns! — Bathsheba, Dec 13 '16 at 19:58
@Bathsheba I wish I could do this but it would be my first time to use a Java Native Interface. Also I have not much experience with other programming languages. — , Dec 13 '16 at 20:03
If you're doing your PhD it's well worth learning C++. You'll be enlightened... — Bathsheba, Dec 13 '16 at 20:04
Describe your code better and explain what it is currently doing wrong. It's fine that it is not polished or even complete yet. It's not fine that you dumped it on us with no description or explanation. — Mad Physicist, Dec 13 '16 at 20:18
@Bathsheba The recommendations that you gave here are at least questionable. I'm not up to language bashing, but the potential advantages that C++ may offer here would be lost in the JNI layer (which is plain C). This *is* something that can clearly and appropriately be solved in Java, and recommending a different language for a clearly tagged question is not really productive. — Marco13, Dec 13 '16 at 20:19
@Bathsheba. Besides OP asking specifically about java, what in this problem strikes you as "serious scientific programming"? Do you really think that OP cares what your personal favorite language is? — Mad Physicist, Dec 13 '16 at 20:19
Possibly not but I do have experience in both languages (I'll be the first to admit that I have more in C++ and C than Java), but what I can offer, is that if I were presented with this brief, I'd pick C++ as the implementation. That's pretty much it. — Bathsheba, Dec 13 '16 at 20:21
Strongly related: http://stackoverflow.com/questions/2974362/parsing-a-chemical-formula (nearly a duplicate - at least, **if** the other question had a nice answer, it would be a large building block for the solution of this one...) — Marco13, Dec 13 '16 at 20:24
@Marco13 I already saw this post but it's actually a bit different. And I tried it with the solution of there but I don't know how to use the Regex. But thank you! — , Dec 13 '16 at 20:26
[How to use Regular Expressions (Java)](https://docs.oracle.com/javase/tutorial/essential/regex/) — jacknad, Dec 13 '16 at 20:31
See this: http://codereview.stackexchange.com/questions/2345/simplify-splitting-a-string-into-alpha-and-numeric-parts — Mad Physicist, Dec 13 '16 at 21:19

score 4 · Answer 1 · edited Apr 13 '17 at 12:40

This question is asking for a simple parser for a simple type of equation. I am assuming that you do not need to support all kinds of irregular equations with parentheses and weird symbols.

Just to be safe, I would use a lot of String.split() instead of regexes.

A (relatively) simple solution would do the following:

Split on ->
Make sure there are two pieces
Sum up each piece:
1. Split on +
2. Parse each molecule and sum up the atoms:
  1. Parse optional multiplier
  2. Find all matches to molecule regex
  3. Convert the numbers and add them up by element
Compare the results

Each level of parsing can be handily done in a separate method. Using regex is probably the best way to parse the individual molecules, so I borrowed the expression from here: https://codereview.stackexchange.com/questions/2345/simplify-splitting-a-string-into-alpha-and-numeric-parts. The regex is pretty much trivial, so please bear with me:

import java.util.Map;
import java.util.HashMap;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class SimpleChemicalEquationParser
{
    // Counts of elements on each side
    private Map<String, Integer> left;
    private Map<String, Integer> right;

    public SimpleChemicalEquationParser(String eqn)
    {
        this.left = new HashMap<>();
        this.right = new HashMap<>();
        parse(eqn);
    }

    public boolean isBalanced()
    {
        return left.equals(right);
    }

    public boolean isSimpleBalanced()
    {
        return leftCount() == rightCount();
    }

    public int leftCount()
    {
        return left.values().stream().mapToInt(Integer::intValue).sum();
    }

    public int rightCount()
    {
        return right.values().stream().mapToInt(Integer::intValue).sum();
    }

    private void parse(String eqn)
    {
        String[] sides = eqn.split("->");
        if(sides.length != 2) {
            throw new RuntimeException("Check your equation. There should be exactly one -> symbol somewhere");
        }
        parseSide(sides[0], this.left);
        parseSide(sides[1], this.right);
    }

    private void parseSide(String side, Map<String, Integer> counter)
    {
        String[] molecules = side.split("\\+");
        for(String molecule : molecules) {
            parseMolecule(molecule, counter);
        }
    }

    private void parseMolecule(String molecule, Map<String, Integer> counter)
    {
        molecule = molecule.trim();
        Matcher matcher = Pattern.compile("([a-zA-Z]+)\\s*([0-9]*)").matcher(molecule);
        int multiplier = 1;
        int endIndex = 0;
        while(matcher.find()) {
            String separator = molecule.substring(endIndex, matcher.start()).trim();
            if(!separator.isEmpty()) {
                // Check if there is a premultiplier before the first element
                if(endIndex == 0) {
                    String multiplierString = molecule.substring(0, matcher.start()).trim();
                    try {
                        multiplier = Integer.parseInt(multiplierString);
                    } catch(NumberFormatException nfe) {
                        throw new RuntimeException("Invalid prefix \"" + multiplierString +
                                                   "\" to molecule \"" + molecule.substring(matcher.start()) + "\"");
                    }
                } else {
                    throw new RuntimeException("Nonsensical characters \"" + separator +
                                               "\" in molecule \"" + molecule + "\"");
                }
            }
            parseElement(multiplier, matcher.group(1), matcher.group(2), counter);
            endIndex = matcher.end();
        }
        if(endIndex != molecule.length()) {
            throw new RuntimeException("Invalid end to side: \"" + molecule.substring(endIndex) + "\"");
        }
    }

    private void parseElement(int multiplier, String element, String atoms, Map<String, Integer> counter)
    {
        if(!atoms.isEmpty())
            multiplier *= Integer.parseInt(atoms);
        if(counter.containsKey(element))
            multiplier += counter.get(element);
        counter.put(element, multiplier);
    }

    public static void main(String[] args)
    {
        // Collect all command line arguments into one equation
        StringBuilder sb = new StringBuilder();
        for(String arg : args)
            sb.append(arg).append(' ');

        String eqn = sb.toString();
        SimpleChemicalEquationParser parser = new SimpleChemicalEquationParser(eqn);
        boolean simpleBalanced = parser.isSimpleBalanced();
        boolean balanced = parser.isBalanced();

        System.out.println("Left: " + parser.leftCount());
        for(Map.Entry<String, Integer> entry : parser.left.entrySet()) {
            System.out.println("    " + entry.getKey() + ": " + entry.getValue());
        }
        System.out.println();

        System.out.println("Right: " + parser.rightCount());
        for(Map.Entry<String, Integer> entry : parser.right.entrySet()) {
            System.out.println("    " + entry.getKey() + ": " + entry.getValue());
        }
        System.out.println();

        System.out.println("Atom counts match: " + simpleBalanced);
        System.out.println("Elements match: " + balanced);
    }
}

All the work is done by the parse method and it's subordinates, which make a sort of virtual call tree. Since this approach makes it especially easy to make sure that the atoms of each element are actually balanced out, I have gone ahead and done that here. This class prints the counts of the atoms on each side of the equation, whether or not the raw counts balance out, as well as whether or not they match my element type. Here are a couple of example runs:

OP's original example:

$ java -cp . SimpleChemicalEquationParser '12 C O2 + 6 H2O -> 2 C6H12O6 + 12 O2'
Left: 54
    C: 12
    H: 12
    O: 30

Right: 72
    C: 12
    H: 24
    O: 36

Atom counts match: false
Elements match: false

Added Ozone to make the number of atoms match up

$ java -cp . SimpleChemicalEquationParser '12 C O2 + 6 H2O + 6 O3 -> 2 C6H12O6 + 12 O2'
Left: 72
    C: 12
    H: 12
    O: 48

Right: 72
    C: 12
    H: 24
    O: 36

Atom counts match: true
Elements match: false

Added water to make everything match up

$ java -cp . SimpleChemicalEquationParser '12 C O2 + 12 H2O -> 2 C6H12O6 + 12 O2'
Left: 72
    C: 12
    H: 24
    O: 36

Right: 72
    C: 12
    H: 24
    O: 36

Atom counts match: true
Elements match: true

Notice that I added a space between C and O in CO2. This is because my current regex for molecules, ([a-zA-Z]+)\\s*([0-9]*), allows any combination of letters to represent an element. If your elements are always going to be simple one-letter elements, change this to ([a-zA-Z])\\s*([0-9]*) (remove the + quantifier). If they are going to be properly named, two letter combinations with the second letter always lowercase, do this instead: ([A-Z][a-z]?)\\s*([0-9]*). I recommend the latter option. For both modified versions, the space in C O2 will no longer be necessary.

score 2 · Accepted Answer · answered Dec 13 '16 at 20:19

2

So, every time I need to parse some text with Java, I mostly end up just using Regex. So I'd recommend you to also do so.

You can test regular expressions at regex101.com.

And also easily use it in Java:

final inputText = ...
final Pattern pattern = Patern.compile("Some regex code");
final Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
    System.out.println(matcher.group(0));
}

Inside Regex you can define capturing groups with ( and ) and then grab the results by matcher.group(int).

For example, you may first separate the equation using (.*) -> (.*).

Then loop the left and right group using find with: (\d+) (\w+)(?: \+| -|$).

After that you can use group(1) for the amount and group(2) for the element.

And if needed also iterate the second group (the element) for the exact element distribution using (\w)(\d?). Then the first group is the element, for example for the text CO2 it yields two hits, the first hit has group(1) -> C and no second group. The second hit has group(1) -> O and group(2) -> 2.

Test your regex here: regex101#Q6KMJo

answered Dec 13 '16 at 20:19

Zabuzard

25,064
8
58
82

1

I won't downvote, but doubt that this is really helpful: The crucial question would then be: **WHICH** regex to use? (You know what they say about regexes and problems....). Apart from that, I doubt that more complex formulas can be parsed with regex.... – Marco13 Dec 13 '16 at 20:23
Would be really nice if you provided the regex. – Mad Physicist Dec 13 '16 at 20:23
I exactly explained the technique to use and provided all **Regex** codes which are needed to parse the given formulas. Yes, it may be hard for complex formulas. – Zabuzard Dec 13 '16 at 20:24
Specifically, you would have to have a repeated capture group (at least in the knee-jerk regex I am envisioning) and it may be hard to extract the repeated captures. – Mad Physicist Dec 13 '16 at 20:24
I don't think so as you can couple the Regex with Java. Just use `while (matcher.find())`. This will iterate the whole input text and search for the given pattern. Yep, in practice (depending on the exact formulas) it may be hard. However, it may also be possible that this is enough for OP to solve his problems :) – Zabuzard Dec 13 '16 at 20:27
A [related question](http://stackoverflow.com/questions/2974362/parsing-a-chemical-formula) gives an example formula like `C6H2(NO2)3CH3`. (I'm not even sure whether these formulas are *regular* in the sense that they can be parsed with a regex - it might well be that they need some context or at least more complex grammar) – Marco13 Dec 13 '16 at 20:28
@Zabuza. It would be really helpful if you replaced `"Some regex code"` with the contents of regex101#Q6KMJo. Also, keep in mind that your regex is PCRE, and Java is not 100% Perl compatible. – Mad Physicist Dec 13 '16 at 20:34
@Zabuza Thank you for your answer! It helped me a lot. My last Problem is that I don't have always a digit before the chemical forumla. How can I use it or ignore it? – Dec 13 '16 at 21:05
@Zabuza Regex? [Now you have two problems](http://meta.stackexchange.com/a/20842/338941). – adentinger Mar 06 '17 at 19:36

score 0 · Answer 3 · answered Mar 08 '17 at 02:06

The way a proper parser such as ANTLR works is to 1) convert the text into a stream of lexical tokens, then 2) parse the tokens with lookahead into a parse tree.

Lookahead is useful to know when to "end" a particular structural level of parsing.

For your requirements, you might be able to skip the distinction between lexing and parsing and just parse from the text directly -- however, an appreciation and use of lookahead would potentially be useful.

In particular a buffer to hold the upcoming (remaining) text, test matches (eg regex) against it, and consume matches from the front could be useful. This could be implemented either by modifying the remaining string or by advancing an index within it.

Given such a buffer, your pseudocode might look like:

class EquationParser {
    protected ParseBuffer buffer;

    // parse Equation;      
    //      -- of form "Sum -> Sum".
    //      -- eg. "12 CO2 + 6 H2O -> 2 C6H12O6 + 12 O2"
    public Equation parseEquation() {

        Sum left = parseSum();
        matchAndConsume("->");
        Sum right = parseSum();
        return new Equation( left, right);
    }

    // parse Sum;
    //      -- eg. "12 CO2 + 6 H2O"
    public Sum parseSum() {
        // parse 1 or more Product terms;
        Sum result = new Sum();
        result.add( parseProduct());
        while (buffer.lookaheadMatch("\\+")) {
            buffer.consumeMatch("\\+");
            result.add( parseProduct());
        }
        return result;
    }

    // parse Product term;
    //      -- of form "N formula", or just "formula".
    //      -- eg. "12 CO2" or "CO2"
    public Product parseProduct() {
        int quantity = 1;
        if (buffer.lookaheadMatch("\\d+")) {
            quantity = Integer.parseInt( buffer.consumeMatch("\\d+"));
        }
        Formula formula = parseFormula();
        return new Product( quantity, formula);
    }

    // parse Formula;
    //      -- eg. "C6H12O6" or "CO2"
    public Formula parseFormula() {
        Formula result = new Formula();
        result.add( parseTerm());
        while (buffer.lookaheadMatch("[A-Z][a-z]?\\d*")) {
            result.add( parseTerm());
        }
        return result;
    }

    // parse Term;
    //      -- eg. "C6", "C", "Co6", or "Co6"
    public Term parseTerm() {
        // ... reader exercise to implement...
    }


    protected void matchAndConsume (String patt) {
        if (! buffer.lookaheadMatch( patt))
            throw ParseFailed("parse failed:  expected "+patt);
        buffer.consumeMatch( patt);
    }
}

This is conceptual example code, not tested & does not include the buffer or complete parser -- it is the reader's job to flesh these out to a complete solution.

Parsing a string that represents a chemical reaction and verify if the reaction is possible

3 Answers3

OP's original example:

Added Ozone to make the number of atoms match up

Added water to make everything match up

Linked