6

I want to split a string along spaces, ignoring spaces if they are contained inside single quotes, and ignoring single quotes if they are escaped (i.e., \' ) I have the following completed from another question.

    String s = "Some message I want to split 'but keeping this a\'s a single string' Voila!";
    for (String a : s.split(" (?=([^\']*\'[^\"]*\')*[^\']*$)")) {
        System.out.println(a);
    }

The output of the above code is

Some
message
I
want
to
split
'but
keeping
this
'a's a single string'
Voila!

However, I need single quotes to be ignored if they are escaped ( \' ), which the above does not do. Also, I need the first and last single quotes and forward slashes removed, if and only if it (the forward slashes) are escaping a single quote (to where 'this is a \'string' would become this is a 'string). I have no idea how to use regex. How would I accomplish this?

Community
  • 1
  • 1
Tyler Senter
  • 137
  • 1
  • 13

3 Answers3

3

You need to use a negative lookbehind to take care of escaped single quotes:

String str = 
        "Some message I want to split 'but keeping this a\\'s a single string' Voila!";

String[] toks = str.split( " +(?=((.*?(?<!\\\\)'){2})*[^']*$)" );
for (String tok: toks)
    System.out.printf("<%s>%n", tok);

output:

<Some>
<message>
<I>
<want>
<to>
<split>
<'but keeping this a\'s a single string'>
<Voila!>

PS: As you noted that escaped single quote needs to be typed as \\' in String assignment otherwise it will be treated as plain '

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • To make things simpler, I decided to use the string " A message 'with an embedded message!' ". Using your pattern, I receive the strings A message 'with an embed\'ded message!' How would I remove to beginning and ending single quotes, but doing it in a way that replaces the " \' " with a single single quote? – Tyler Senter Mar 01 '15 at 08:17
  • Edit to my comment above. The strings are `A` `message` `'with an embedded message!'` Sorry, I'm still not used to SE Markdown – Tyler Senter Mar 01 '15 at 08:33
  • This same code works for `"A message 'with an embedded message!'"` as well – anubhava Mar 01 '15 at 08:34
  • How do I remove the quotes though? I don't want to use the replace method, because that can replace internal quotes. And yet another edit. Finally, the output of your pattern returns `A` `message` `'with an embed\'ded message!'`. Therefore, how would I convert `'with an embed\'ded message!'` to `with an embed'ded message!`? – Tyler Senter Mar 01 '15 at 08:39
  • But please understand that splitting by space is different from removing quotes from output. Better you split it by space as per your question and if needed then replace the quotes. Alternative is to use `Pattern` `Matcher` etc and extract what you want. – anubhava Mar 01 '15 at 08:49
  • I do realize that. Sorry, I didn't mean to sound angry. However, I believe I found another way. Thank you very much for your help! – Tyler Senter Mar 01 '15 at 08:55
  • I just noticed something. If I try to include more than one statement, it's only the last statement that's formatted properly. Otherwise, the string is still split along spaces. It isn't necessary for this to be added, but if it's possible, I'd love to know about it. – Tyler Senter Mar 03 '15 at 03:12
  • But this solution isn't even splitting on sentence ending. It is just splitting on space and leaving quoted strings intact. – anubhava Mar 03 '15 at 07:49
  • I mean, if I use the string `this is a message 'as is this' another split 'another whole message'`, every space is split except the second set of quotes. The first set of quotes is still split – Tyler Senter Mar 03 '15 at 19:53
  • I get a PatternSyntaxException, saying that there is an unclosed character class near index 42, which appears to be the last parenthesis. – Tyler Senter Mar 03 '15 at 23:38
1

or you could use this pattern to capture what you want

('(?:[^']|(?!<\\\\)')*'|\S+)  

Demo

alpha bravo
  • 7,838
  • 1
  • 19
  • 23
1

I was really overthinking this one.

This should work, and the best part is that it doesn't use lookarounds at all (so it works in nearly ever regex implementation, most famously javascript)

('[^']*?(?:\\'[^']*?)*'|[^\s]+)

Instead of using a split, use a match to build an array with this regex.

My objectives were

  • It can discern between an escaped apostrophe and not (of course)
  • It's fast. The behemoth I wrote before actually took time
  • It worked with multiple subquotes, a lot of suggestions here don't.

Demo

  • Test String: Discerning between 'the single quote\'s double purpose' as a 'quote marker', like ", and a 'a cotraction\'s marker.'.

    If you asked the author and he was speaking in the third person, he would say 'CFQueryParam\'s example is contrived, and he knew that but he had the world\'s most difficult time thinking up an example.'

    Some message I want to split 'but keeping this a\'s a single string' Voila!

  • Result: Discerning, between, 'the single quote\'s double purpose', as, a, 'quote marker',,, like, ",, and, a, 'a cotraction\'s marker.',.,

    If, you, asked, the, author, and, he, was, speaking, in, the, third, person,, he, would, say, 'CFQueryParam\'s example is contrived, and he knew that but he had the world\'s most difficult time thinking up an example.',

    Some, message, I, want, to, split, 'but keeping this a\'s a single string', Voila!

Regular Jo
  • 5,190
  • 3
  • 25
  • 47