3

Is there any way to split commas outside quotes, and ignore double single quotes within the quotes? This would be really useful when manipulating with SQL.

I'm trying to split SQL statements, and SQL uses single quote to escape single quote within strings.

e.g.

String source = "ADDRESS.CITY || ', UK''s', ADDRESS.CITY || ', US''s', ADDRESS.CITY || ', UK''s'";

String[] expected = new String[]{
"ADDRESS.CITY || ', UK''s'", 
"ADDRESS.CITY || ', US''s'",
"ADDRESS.CITY || ', UK''s'"
};

String[] result = splitElements(source);
assert expected.equals(result);

I have tried Splitting on comma outside quotes and changed it to single quotes i.e.

source.split(",(?=(?:[^\']*\'[^\']*\')*[^\']*$)")

The problem is it doesn't ignore double single quotes.

I have also tried to combine it with Split with single colon but not double colon using regex, but wouldn't be able to get it working so far.

Community
  • 1
  • 1
Max
  • 1,064
  • 9
  • 23
  • For whoever read my original post, @Bohemian pointed out my mistake on the 'source' and 'expected'. I missed a single quote at the end of the string. Sorry for the confusion. This has been corrected. – Max Sep 05 '15 at 15:57

3 Answers3

2

Use this regex for the split:

String[] parts = source.split(", *(?=((([^']|'')*'){2})*([^']|'')*$)");

This regex uses a look ahead that asserts the number of quotes following the current position is even, which logically means the comma is not enclosed.

The "key" here is using an alternation to define a "non quote" as either [^'] or '', which means double quotes are consumed/treated as if they are a single character.

Note:

There is a missing final quote in your test case, which I have repaired in the test code below. If the quote is not added, your test case is syntactically invalid SQL and this code relies on quotes being balanced.


Some test code:

String source = "ADDRESS.CITY || ', UK''s', ADDRESS.CITY || ', US''s', ADDRESS.CITY || ', UK''s'";
String[] parts = source.split(", *(?=((([^']|'')*'){2})*([^']|'')*$)");
Arrays.stream(parts).forEach(System.out::println);

Output:

ADDRESS.CITY || ', UK''s'
ADDRESS.CITY || ', US''s'
ADDRESS.CITY || ', UK''s'
Community
  • 1
  • 1
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • This is not giving expected answer. `ADDRESS.CITY || '` `UK''s', ADDRESS.CITY || '` `US''s', ADDRESS.CITY || '` `UK''s`. It is returning four elements after split. – YoungHobbit Sep 04 '15 at 17:08
  • 1
    @abhishekbafna there is an error in your test input - there's a quote missing at the end (which makes it invalid input). See edits to my answer. – Bohemian Sep 04 '15 at 17:13
  • 1
    @Bohemian Thank you so much, for also correcting my mistake. Sorry about posting something with an error that confuses people. I have tried some of my other test cases and they all work as expected. – Max Sep 05 '15 at 15:50
  • 1
    @abhishekbafna Sorry for my mistakes. Bohemian is right. – Max Sep 05 '15 at 15:51
  • @Bohemian If you are interested and don't mind taking a look, I have posted another challenge based on the current one: http://stackoverflow.com/questions/32415746/splitting-on-comma-either-outside-quotes-when-escaped-quotes-exist-or-outside-b – Max Sep 05 '15 at 17:23
  • @Ming regarding the other Q, because open and close brackets are different characters and because literal brackets within quotes don't count and can be imbalanced, you are in to the realm of parsing, not pattern matching, so a regex solution isn't feasible. Something like the answer posted there is the only way. – Bohemian Sep 05 '15 at 18:29
  • @Bohemian once again, thank you so much for your explanation – Max Sep 05 '15 at 18:35
  • @Bohemian someone actually solved the problem using regex on the other Q. Thought you might be interested. Cheers – Max Sep 06 '15 at 12:39
  • @Ming no they didn't solve it. Try their solution with this input `"'I am sad :(', to_char(DATE, 'YYYY,MM,DD')"`. The regex is OK, but doesn't solve the general case, because (as I said earlier), you need to *parse* the input to determine with certainty the syntactic tokens of the input. You are basically asking for an SQL parser. – Bohemian Sep 06 '15 at 15:08
  • @Bohemian Thanks again. The Java split solution seems to work. Just to let you know that they have updated the regex one to also solve the problem. I wouldn't be able to find a runtime SQL parser. Just wondering whether you have come across any. – Max Sep 07 '15 at 09:16
1

Just replace double single quotes with some another char (like chr(1)), then split whole string via your's regex, and then remap splitted chunks by replacing that (chr(1)) back to double single quotes.

ankhzet
  • 2,517
  • 1
  • 24
  • 31
0

You should try this pattern:

, ([A-Z.]{12})

see the link below for more information https://regex101.com/r/dB9xH2/1

ps: don't forget to use \1 to replace with the firt pattern found.

Alvaro Silvino
  • 9,441
  • 12
  • 52
  • 80