1

I have a text file that includes some mathematical expressions. I need to parse the text into components (words, sentences, punctuation, numbers and arithmetic signs) using regular expressions, calculate mathematical expressions and return the text in the original form with the calculated numbers expressions. I done this without regular expressions (without calculation). Now I am trying to do this using regular expressions. I not fully understand how to do this correctly. The input text is like this:

Pete like mathematic 5+3 and jesica too sin(3).

In the output I need:

Pete like mathematic 8 and jesica too 0,14.

I need some advice with regex and calculation from people who know how to do this.

My code:

final static Pattern PUNCTUATION = Pattern.compile("([\\s.,!?;:]){1,}");
final static Pattern LETTER = Pattern.compile("([а-яА-Яa-zA-Z&&[^sin]]){1,}");
    List<Sentence> sentences = new ArrayList<Sentence>();
    List<PartOfSentence> parts = new ArrayList<PartOfSentence>();
    StringTokenizer st = new StringTokenizer(text, " \t\n\r:;.!?,/\\|\"\'",
            true);

The code with regex (not working):

while (st.hasMoreTokens()) {

        String s = st.nextToken().trim();
        int size = s.length();
        for (int i=0; i<s.length();i++){
        //with regex. not working variant
        Matcher m = LETTER.matcher(s);
        if (m.matches()){
            parts.add(new Word(s.toCharArray()));
        }
        m = PUNCTUATION.matcher(s);
        if (m.matches()){
            parts.add(new Punctuation(s.charAt(0)));
        }
        Sentence buf = new Sentence(parts);
        if (buf.getWords().size() != 0) {
            sentences.add(buf);
            parts = new ArrayList<PartOfSentence>();
        } else
            parts.add(new Punctuation(s.charAt(0)));

Without regex (working):

if (size < 1)
            continue;
        if (size == 1) {
            switch (s.charAt(0)) {
            case ' ':               
                continue;
            case ',':
            case ';':
            case ':':
            case '\'':
            case '\"':
                parts.add(new Punctuation(s.charAt(0)));
                break;
            case '.':
            case '?':
            case '!':
                parts.add(new Punctuation(s.charAt(0)));
                Sentence buf = new Sentence(parts);
                if (buf.getWords().size() != 0) {
                    sentences.add(buf);
                    parts = new ArrayList<PartOfSentence>();
                } else
                    parts.add(new Punctuation(s.charAt(0)));
                break;
            default:
                parts.add(new Word(s.toCharArray()));
            }

        } else {
            parts.add(new Word(s.toCharArray()));
        }
    }
iliya.rudberg
  • 739
  • 12
  • 23

3 Answers3

0

I think you could start by looking for "Function" matching in your input String. Then all is not matched with a Function is simply returned.

For example, this short code do, i hope, what you are seeking :

Class with Main method.

public class App {
    StringTokenizer st = new StringTokenizer("Pete likes Mathematics 3+3 and Jessica too 6+3.", " \t\n\r:;.!?,/\\|\"\'", true);

    public static void main(String[] args) {
        new App();
    }
    public App(){
        ArrayList<String> renderedStrings = new ArrayList<String>();
        while(st.hasMoreTokens()){
            String s = st.nextToken();
            if(!AdditionPatternFuntion.render(s, renderedStrings)){
                renderedStrings.add(s);
            }
        }
        for(String s : renderedStrings){
            System.out.print(s);
        }
    }   
}

Class "AdditionPattern" that does the real Job

import java.util.ArrayList;
import java.util.StringTokenizer;
import java.util.regex.Pattern;

class AdditionPatternFuntion{
    public static boolean render(String s, ArrayList<String> renderedStrings){
        Pattern pattern = Pattern.compile("(\\d\\+\\d)");
        boolean match = pattern.matcher(s).matches();
        if(match){
            StringTokenizer additionTokenier = new StringTokenizer(s, "+", false);
            Integer firstOperand = new Integer(additionTokenier.nextToken());
            Integer secondOperand = new Integer(additionTokenier.nextToken());
            renderedStrings.add(new Integer(firstOperand + secondOperand).toString());
        }
        return match;
    }
}

When I run with this input :

Pete likes Mathematics 3+3 and Jessica too 6+3.

I getthis output :

Pete likes Mathematics 6 and Jessica too 9.

To handle "sin()" function you can do the same : Create a new class, "SinPatternFunction" for instance, and do the same.

I think you should even create an Abstract class "FunctionPattern" with a abstract method "render" inside it which you will implement with the AssitionPatternFunction and SinPatternFunction classes. Finally, you would be able to create a class, let's call it "PatternFunctionHandler", which will create a list of PatternFunction (a SinPatternFunction, an AdditionPatternFunction (and so on)) then call render on each one and return the result.

Morgan
  • 124
  • 4
0

This is not a trivial problem to solve as even matching numbers can become quite involved.

Firstly, a number can be matched by the regular expression "(\\d*(\\.\\d*)?\\d(e\\d+)?)" to account for decimal places and exponent formats.

Secondly, there are (at least) three types of expressions that you want to solve: binary, unary and functions. For each one, we create a pattern to match in the solve method.

Thirdly, there are numerous libraries that can implement the reduce method like this or this.

The implementation below does not handle nested expressions e.g., sin(5) + cos(3) or spaces in expressions.

private static final String NUM = "(\\d*(\\.\\d*)?\\d(e\\d+)?)";

public String solve(String expr) {
    expr = solve(expr, "(" + NUM + "(!|\\+\\+|--))"); //unary operators
    expr = solve(expr, "(" + NUM + "([+-/*]" + NUM + ")+)"); // binary operators
    expr = solve(expr, "((sin|cos|tan)\\(" + NUM + "\\))"); // functions

    return expr;
}

private String solve(String expr, String pattern) {
    Matcher m = Pattern.compile(pattern).matcher(expr);

    // assume a reduce method :String -> String that solve expressions 
    while(m.find()){
        expr = m.replaceAll(reduce(m.group()));
    }
    return expr;
}

//evaluate expression using exp4j, format to 2 decimal places, 
//remove trailing 0s and dangling decimal point
private String reduce(String expr){
    double res = new ExpressionBuilder(expr).build().evaluate();
    return String.format("%.2f",res).replaceAll("0*$", "").replaceAll("\\.$", ""); 
}
Community
  • 1
  • 1
Aimee Borda
  • 842
  • 2
  • 11
  • 22
  • hey ! thank you for the answer. I do not fully understand how to reallize reduce method in my project. Can you get me a little example ? – iliya.rudberg Jan 06 '17 at 15:46
  • There are numerous libraries that given a string `3+4` evaluate the expression and returns `7`. I updated the answer by using one of the libraries - exp4j. In the `reduce` you can implement how you want to evaluate individual expressions – Aimee Borda Jan 06 '17 at 15:49
  • Thanks ! Cool library, did by this way, all is working. – iliya.rudberg Jan 08 '17 at 02:07
0

Your specified requirement is to use regular expressions to:

  1. Divide text into components (words, ...)
  2. Return text with inner arithmetic expressions evaluated

You have started with first step using regular expressions, but have not quite completed it -- after completing it, there remains to:

  1. Recognize and parse components that form arithmetic (sub)expressions.
  2. Evaluate recognized (sub)expression components and produce a value. For evaluating (sub)expressions in infix notation, there exists a very helpful answer.
  3. Substituting value replacements back into original string -- should be simple.

For text division into components defined strictly enough to allow later unambiguos evaluation of the subexpression, I coded a sample, trying out named capturing groups in Java. This sample handles only integer numbers, but floating point should be simple to add.

Sample output on some test inputs was as follows:

Matching 'Pete like mathematic 5+3 and jesica too sin(3).'
WORD('Pete'),WS(' '),WORD('like'),WS(' '),WORD('mathematic'),WS(' '),NUM('5'),OP('+'),NUM('3'),WS(' '),WORD('and'),WS(' '),WORD('jesica'),WS(' '),WORD('too'),WS(' '),FUNC('sin'),FOPENP('('),NUM('3'),CLOSEP(')'),DOT('.')
Matching 'How about solving sin(3 + cos(x)).'
WORD('How'),WS(' '),WORD('about'),WS(' '),WORD('solving'),WS(' '),FUNC('sin'),FOPENP('('),NUM('3'),WS(' '),OP('+'),WS(' '),FUNC('cos'),FOPENP('('),WORD('x'),CLOSEP(')'),CLOSEP(')'),DOT('.')
Matching 'Or arcsin(4.2) we do not know about?'
WORD('Or'),WS(' '),WORD('arcsin'),OPENP('('),NUM('4'),DOT('.'),NUM('2'),CLOSEP(')'),WS(' '),WORD('we'),WS(' '),WORD('do'),WS(' '),WORD('not'),WS(' '),WORD('know'),WS(' '),WORD('about'),PUNCT('?')
Matching ''sin sin sin' the catholic priest has said...'
PUNCT('''),WORD('sin'),WS(' '),WORD('sin'),WS(' '),WORD('sin'),PUNCT('''),WS(' '),WORD('the'),WS(' '),WORD('catholic'),WS(' '),WORD('priest'),WS(' '),WORD('has'),WS(' '),WORD('said'),DOT('.'),DOT('.'),DOT('.')

On named capturing group usage, I found it inconvenient that compiled Pattern or acquired Matcher APIs do not provide access to present group names. Sample code below.

import java.util.*;
import java.util.regex.*;

import static java.util.stream.Collectors.joining;

public class Lexer {
    // differentiating _function call opening parentheses_ from expressions one
    static final String S_FOPENP = "(?<fopenp>\\()";
    static final String S_FUNC = "(?<func>(sin|cos|tan))" + S_FOPENP;
    // expression or text opening parentheses
    static final String S_OPENP = "(?<openp>\\()";
    // expression or text closing parentheses
    static final String S_CLOSEP = "(?<closep>\\))";
    // separate dot, should help with introducing floating-point support
    static final String S_DOT = "(?<dot>\\.)";
    // other recognized punctuation
    static final String S_PUNCT = "(?<punct>[,!?;:'\"])";
    // whitespace
    static final String S_WS = "(?<ws>\\s+)";
    // integer number pattern
    static final String S_NUM = "(?<num>\\d+)";
    // treat '* / + -' as mathematical operators. Can be in dashed text.
    static final String S_OP = "(?<op>\\*|/|\\+|-)";
    // word -- refrain from using \w character class that also includes digits
    static final String S_WORD = "(?<word>[a-zA-Z]+)";

    // put the predefined components together into single regular expression
    private static final String S_ALL = "(" +
        S_OPENP + "|" + S_CLOSEP + "|" + S_FUNC + "|" + S_DOT + "|" +
        S_PUNCT + "|" + S_WS + "|" + S_NUM + "|" + S_OP + "|" + S_WORD +
    ")";
    static final Pattern ALL = Pattern.compile(S_ALL); // ... & form Pattern

    // named capturing groups defined in regular expressions
    static final List<String> GROUPS = Arrays.asList(
        "func", "fopenp",
        "openp", "closep",
        "dot", "punct", "ws",
        "num", "op",
        "word"
    );
    // divide match into components according to capturing groups
    static final List<String> tokenize(Matcher m) {
        List<String> tokens = new LinkedList<>();
        while (m.find()){
            for (String group : GROUPS) {
                String grResult = m.group(group);
                if (grResult != null)
                    tokens.add(group.toUpperCase() + "('" + grResult + "')");
            }
        }

        return tokens;
    }

    // some sample inputs to test
    static final List<String> INPUTS = Arrays.asList(
        "Pete like mathematic 5+3 and jesica too sin(3).",
        "How about solving sin(3 + cos(x)).",
        "Or arcsin(4.2) we do not know about?",
        "'sin sin sin' the catholic priest has said..."
    );

    // test
    public static void main(String[] args) {
        for (String input: INPUTS) {
            Matcher m = ALL.matcher(input);
            System.out.println("Matching '" + input + "'");
            System.out.println(tokenize(m).stream().collect(joining(",")));
        }
    }
}
Community
  • 1
  • 1
unserializable
  • 106
  • 1
  • 5