0
MID4BNW2Uq-01;Standard Offline - Acc 01;SA\;BATE:GOOGN

I'm trying to split the above line on semicolons like so: line.split(";", -1).

The resulting list that I need is:

1. MID4BNW2Uq-01
2. Standard Offline - Acc 01
3. SA\;BATE:GOOGN

But instead, I get one more element because of that ";" inside SA\;BATE:GOOGN:

1. MID4BNW2Uq-01
2. Standard Offline - Acc 01
3. SA\
4. BATE:GOOGN

I'm looking for a way to make the .split method match ";" BUT NOT "\;". In other words, split on the semicolon (;) only if there's no "\" right before it.

I've thought about using regex but I'm at a complete loss when it comes to it. Any help would be much appreciated. Thank you!

BDL
  • 21,052
  • 22
  • 49
  • 55
George Cimpoies
  • 884
  • 2
  • 14
  • 26
  • split("[^\\]?") maybe? Not entirely sure about Java regex syntax, but it should be like this. – daniu Aug 23 '17 at 12:36
  • I've tried it but I get a red squiggly line saying "Illegal escape character in string literal" – George Cimpoies Aug 23 '17 at 12:38
  • @GeorgeCimpoies work with `(?<!\\);` or `(?<!\\\\);` ? If second one you may accept my answer ^^ – azro Aug 23 '17 at 12:52
  • It is much safer to *match* using [`String pat = "(?s)(?:[^;\\\\]|\\\\.)+";`](https://regex101.com/r/9plWhS/1) pattern. – Wiktor Stribiżew Aug 23 '17 at 12:52
  • Please don't edit your question to include an answer. If you have found a solution that wasn't proposed by any of the other answers feel free to add one. – BDL Aug 23 '17 at 13:34

1 Answers1

2

What you're looking for is a zero-length assertion called "negative lookbehind".

For example,

(?<!a)b

matches a "b" that is not preceded by an "a", using negative lookbehind.

Try splitting on this:

(?<!\\);

The backslash is a special character in regular expressions so it must be escaped using an extra backslash.

neuhaus
  • 3,886
  • 1
  • 10
  • 27