5

I'm stuck with this regex.

So, I have input as:

  • "Crane device, (physical object)"(X1,x2,x4), not "Seen by research nurse (finding)", EntirePatellaBodyStructure(X1,X8), "Besnoitia wallacei (organism)", "Catatropis (organism)"(X1,x2,x4), not IntracerebralRouteQualifierValue, "Diospyros virginiana (organism)"(X1,x2,x4), not SuturingOfHandProcedure(X1)

and in the end I would like to get is:

  • "Crane device, (physical object)"(X1,x2,x4)
  • not "Seen by research nurse (finding)"
  • EntirePatellaBodyStructure(X1,X8)
  • "Besnoitia wallacei (organism)"
  • "Catatropis (organism)"(X1,x2,x4)
  • not IntracerebralRouteQualifierValue
  • "Diospyros virginiana (organism)"(X1,x2,x4)
  • not SuturingOfHandProcedure(X1)

I've tried regex

(\'[^\']*\')|(\"[^\"]*\")|([^,]+)|\\s*,\\s*

It works if I don't have a comma inside parentheses.

user207421
  • 305,947
  • 44
  • 307
  • 483
Vadim Ivanov
  • 633
  • 1
  • 7
  • 16

4 Answers4

3

RegEx

(\w+\s)?("[^"]+"|\w+)(\(\w\d(,\w\d)*\))?

Java Code

String input = ... ;
Matcher m = Pattern.compile(
          "(\\w+\\s)?(\"[^\"]+\"|\\w+)(\\(\\w\\d(,\\w\\d)*\\))?").matcher(input);
while(matcher.find()) {
    System.out.println(matcher.group());
}

Output

"Crane device, (physical object)"(X1,x2,x4)
not "Seen by research nurse (finding)"
EntirePatellaBodyStructure(X1,X8)
not "Besnoitia wallacei (organism)"(X1,x2,x4)
not "Catatropis (organism)"(X1,x2,x4)
not IntracerebralRouteQualifierValue
not "Diospyros virginiana (organism)"(X1,x2,x4)
not SuturingOfHandProcedure(X1)
Ravi K Thapliyal
  • 51,095
  • 9
  • 76
  • 89
  • with example that I've given in the beging your Regex works perfect. But, unfortunately, with updated example not, specifically, part without any quotes. But thank you anyway, I'll try to improve it. – Vadim Ivanov May 23 '13 at 18:24
  • Check update. Added regex as per your new requirements. – Ravi K Thapliyal May 23 '13 at 18:38
1

Don't use regexes for this. Write a simple parser that keeps track of the number of parentheses encountered, and whether or not you are inside quotes. For more information, see: RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
We Are All Monica
  • 13,000
  • 8
  • 46
  • 72
0

Would this do what you need?

System.out.println(yourString.replaceAll(", not", "\nnot"));
John484
  • 66
  • 5
0

Assuming that there is no possibility of nesting () within (), and no possibility of (say) \" within "", you can write something like:

private static final Pattern CUSTOM_SPLIT_PATTERN =
    Pattern.compile("\\s*((?:\"[^\"]*\"|[(][^)]*[)]|[^\"(]+)+)");
private static final String[] customSplit(final String input) {
    final List<String> ret = new ArrayList<String>();
    final Matcher m = CUSTOM_SPLIT_PATTERN.matcher(input);
    while(m.find()) {
        ret.add(m.group(1));
    }
    return ret.toArray(new String[ret.size()]);
}

(disclaimer: not tested).

ruakh
  • 175,680
  • 26
  • 273
  • 307