0
STOPATDESK YES;
:: TXT "LCLLMT:29.4700";
:: TXT "LCLCURR;NON-USD";
:: TXT "CALLBK:3";
:: TXT "FFTRL:EUR-LIM;-TAP-5";

STOPATDESK YES; :: TXT "LCLLMT:29.4700"; :: TXT "LCLCURR;NON-USD"; :: TXT "CALLBK:3"; :: TXT "FFTRL:EUR-LIM;-TAP-5";

Could you please provide regex that will match semicolons but not within TXT "..."?

There were several useful questions on StackOverflow but I failed to compile working solution for my case
Regex for matching a character, but not when it's enclosed in square bracket
Regex for matching a character, but not when it's enclosed in quotes

Community
  • 1
  • 1
Mike
  • 20,010
  • 25
  • 97
  • 140
  • there were several usefull questions on StackOverflow but I failed to compile working solution for my case – Mike Sep 09 '15 at 13:23
  • Just match? Easy: `"TEXT\\s*\"[^\"]*\"|(;)"` and grab `.group(1)`. – Wiktor Stribiżew Sep 09 '15 at 13:33
  • I want to use regex pattern in `String.split(String regex)` – Mike Sep 09 '15 at 13:34
  • 1
    Try using [`s.split("(?<!TXT \"[^\"]{0,1000});")`](http://ideone.com/iXXCVc). If the `TXT "...` are not longer than 1000 symbols long, that might work in this case. But I do not think a constrained witdth look-behind is that reliable. – Wiktor Stribiżew Sep 09 '15 at 13:43
  • I could not understand. Do you want to match the semicolons at the end of the lines? – stann1 Sep 09 '15 at 13:21
  • yes, I want to match semicollons but they may not be at the end of the lines – Mike Sep 09 '15 at 13:27
  • look at Max;s answer. In case your semicolons can be found anywhere, it will be very difficult to match them with regex. Have a look at this regex tutorial if you need a more complex pattern: http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial – stann1 Sep 09 '15 at 13:39

2 Answers2

2

You need a regex that matches any semicolon that is not followed by an odd number of quotes.

;(?![^"]*(([^"]*"[^"]*"){2})*[^"]*"[^"]*$)

The tricky part is to find the negative lookahead (?![^"]*(([^"]*"[^"]*"){2})*[^"]*"[^"]*$):

  • [^"]* match any text before the first " after ;
  • (([^"]*"[^"]*"){2})* match any even number of quotes with text inside
  • [^"]*"[^"]*$ match the last quote

If all the above conditions are matched, then an odd number of " is found after ;. That implies that the ; is inside two " and therefore it's not a valid ;.

Regex: https://regex101.com/r/dG6cC1/1

Java demo: https://ideone.com/OuAaA5

Tobías
  • 6,142
  • 4
  • 36
  • 62
0

You can also try with:

"[^"]*"|(;)

DEMO

which will match quotes or separate semicolons, then get separate semicolons with group(1). However the unbalanced quoting sings would couse a problem. Or, if whole file is formated as your example (semicolons in quotation are preceded and followed by another character, not whitespace), you can try with:

;(?=\s|$)

DEMO

It works with example above.

m.cekiera
  • 5,365
  • 5
  • 21
  • 35