-2

I have a string which looks like : String s = "date1, calculatedDate(currentDate, 35), false";.

I need to extract all param of verify function. So the expected result should be :

elem[0] = date1
elem[1] = calculatedDate(currentDate, 35)
elem[2] = false

If I use split function on , char but I got this result :

elem[0] = date1
elem[1] = calculatedDate(currentDate
elem[2] =  35)
elem[3] = false

Moreover, the method have to be generic, because some functions have 2 or 7 parameters...

Did you have any solution to help me on that?

Royce
  • 1,557
  • 5
  • 19
  • 44
  • 1
    This is just and example, but can you also have literal strings? complex expressions? – Maurice Perry Dec 20 '19 at 13:15
  • 2
    Question is totally clear and shows own attempt, although not having a [example]. Please more on context and usage: Would you like to parse more function-calls or other syntax too? Which source-language for (i.e. the `verfiy` function)? – hc_dev Dec 20 '19 at 13:16
  • 1
    @Royce Please [edit] your question to include more example input data with simple and complex structures to see what the inputs are. This might go in the direction of "parsing" the input string, might not be easy to do so. Where do you get the string from anyway? Maybe this is a "XY problem" and can be solved somewhere else much easier. – Progman Dec 20 '19 at 15:31
  • @Royce Please tell us: __Where do the function-calls come from__ (context) and __which language__ are they (_Javascript_, _SQL_) ? Many special libraries are made for such parsing. _Regex or String-operations_ may be not suitable enough :-( – hc_dev Dec 20 '19 at 15:47

2 Answers2

1

Try this:

String s = "verify(date1, calculatedDate(currentDate, 35), false)"; 
Pattern p = Pattern.compile("(?<=verify\\()(\\w+)(,\\s)(.*)(,\\s)((?<=,\\s)\\w+)(?=\\))");
Matcher m = p.matcher(s);
while(m.find()) {
    System.out.println(m.group(1) + "\n" + m.group(3) + "\n" + m.group(5));
}

Update for s = "date1, calculatedDate(currentDate, 35), false":

String s = "date1, calculatedDate(currentDate, 35), false"; 
Pattern p = Pattern.compile("(\\w+)(,\\s)(.*)(,\\s)((?<=,\\s)\\w+)");
Matcher m = p.matcher(s);
while(m.find()) {
    System.out.println(m.group(1) + "\n" + m.group(3) + "\n" + m.group(5));
}

Output:

date1
calculatedDate(currentDate, 35)
false

About regex:

  • (\\w+) one or more(+) word characters
  • (,\\s) , part
  • (.*) matches any character, here just the part between two ,
  • (,\\s) , part
  • ((?<=,\\s)\\w+) ?<= is a positive look behind, helps to catch , false part but does not include ,
Hülya
  • 3,353
  • 2
  • 12
  • 19
  • Thank you. I edited my question. Indeed, I dont need to parse `verify` but only the args. Could you please adapt your answer? – Royce Dec 20 '19 at 13:43
  • 1
    It was not specified, but what happens when parsing more or less parameters, like `date1, calculatedDate(currentDate, 35), false, true`? Does it catch all? – hc_dev Dec 20 '19 at 13:56
  • Yes all have to be captured. Even if there is 2 or 7 parameters. I edited the with this information. – Royce Dec 20 '19 at 14:04
1

You could use StringTokenizer to parse your arguments inside the parentheses:

final static String DELIMITER = ",";
final static String PARENTHESES_START = "(";
final static String PARENTHESES_END = ")";

public static List<String> parseArguments(String text) {
    List<String> arguments = new ArrayList<>();
    StringBuilder argParsed = new StringBuilder();

    StringTokenizer st = new StringTokenizer(text, DELIMITER);
    while (st.hasMoreElements()) {
        // default: add next token
        String token = st.nextToken();
        System.out.println("Token: " + token);
        argParsed.append(token);

        // if token contains '(' we have
        // an expression or nested call as argument 
        if (token.contains(PARENTHESES_START)) {
            System.out.println("Nested expression with ( starting: " + token);

            // reconstruct to string-builder until ')'
            while(st.hasMoreElements() && !token.contains(PARENTHESES_END)) {
                // add eliminated/tokenized delimiter 
                argParsed.append(DELIMITER);

                // default: add next token
                token=st.nextToken();
                System.out.println("Token inside nested expression: " + token);
                argParsed.append(token);
            }
            System.out.println("Nested expression with ) ending: " + token);
        }

        // add complete argument and start fresh
        arguments.add(argParsed.toString());
        argParsed.setLength(0);
    }

    return arguments;
}

It can parse even following input: date1, calculatedDate(currentDate, 35), false, (a+b), x.toString()

Sucessfully found all 5 arguments, including complex ones:

  • (nested) function-calls like calculatedDate(currentDate, 35)
  • expressions like (a+b)
  • method-calls on objects like x.toString()

Run this demo on IDEone.

Read more and extend

There might be more complex texts or grammars to handle (in the future). Then, if neither regex-capturing, nor string-splitting, nor tokenizing can solve, consider using or generating a PEG- or CFG-parser. See the discussion about Regular Expression Vs. String Parsing.

hc_dev
  • 8,389
  • 1
  • 26
  • 38