3

I need help with regexp in Javascript. I am looking for a way to replace the substring ~::~ only if it is inside quotes. Here is my case:

Source string:

"aa\"aa\"aa"~::~ "bbb~::~bbb"  "ccc" ~::~ 
                     ^^^^
                     sub string to remove  

Desired string: "aa\"aa\"aa"~::~ "bbbbbb" "ccc" ~::~

Example code:

var str =' "aa\"aa\"aa"~::~ "bbb~::~bbb"  "ccc" ~::~  ';
var re = /(").*?\1/g;    <-- *just found that it's wrong, as it doesn't support escaped quotes (VK)*
str.replace(re,'');

The problem is that my expression doesn't support escaped quotes.

Thank you very much for your help.

--Vadim

Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
vadimk
  • 1,461
  • 15
  • 11
  • 1
    Don't make your pattern more complicated than it has to be. To get the quoted parts, just use `"[^"]*"`. ;) – Martin Ender Aug 16 '13 at 18:55
  • what are you tring to replace? just `~::~`, or any non alpha character, or something else? – Patrick Evans Aug 16 '13 at 18:57
  • @m.buettner: :) you right! – vadimk Aug 16 '13 at 19:29
  • @Patrick Evans: yes, the pattern ~::~ it is exactly I need to replace. – vadimk Aug 16 '13 at 19:30
  • Another complexity which I missed at first and found just now: it's possible escaped quote ( \" ) inside the string. In fact it is a JSON value part ( "key": "... ~::~ ... " ). Sorry, I didn't point on this. So, my own regex is wrong. – vadimk Aug 16 '13 at 19:35

3 Answers3

2

You can use a replace on a regex like this:

~::~(?=(?:[^"]*"[^"]*")*[^"]*"[^"]*$)

It might be a little difficult to understand, but it basically makes sure that the ~::~ you're replacing has odd numbers of quotes after it.

JSFiddle demo.

Okay, with escaped quotes, it's a bit more complicated since the regex has to 'eat' the escaped quotes as well. You can try this:

~::~(?=(?:(?:[^\\"]|\\"|\\\\)*"(?:[^\\"]|\\"|\\\\)*")*(?:[^\\"]|\\"|\\\\)*"(?:[^\\"]|\\"|\\\\)*$)

'em pretty pictures!

Regular expression visualization

Community
  • 1
  • 1
Jerry
  • 70,495
  • 13
  • 100
  • 144
  • Thank you very much Jerry. Works fine for the example I provided. But I forgot to mention that escaped quote also possible. I apologize for that. – vadimk Aug 16 '13 at 19:43
  • @Jerry note that this does not account for escaping escapes. I.e. `"abc\\"` is a valid string (and your pattern wouldn't accept the closing quote. – Martin Ender Aug 16 '13 at 20:11
  • @vadimk Hi! Um, I was just testing a bit in fiddle but when I put `var str =' "aa\"aa\"aa"~::~ "bbb~::~bbb" "ccc" ~::~ ';` then `alert(str);`, it removed all the backslashes. Does that mean that the actual variable has `var str =' "aa\\"aa\\"aa"~::~ "bbb~::~bbb" "ccc" ~::~ ';` instead? – Jerry Aug 16 '13 at 21:20
  • @Jerry: Hi Jerry. Let me clarify: after JSON.parse/JSON.stringify the string will be like this: '{"foo":"aaa \"bbb\" ccc ~::~ ddd"}' I want to remove ~::~ from the string, but keep it otherwise, for example '{"foo":[ {...}, ~::~ {...} ] }' – vadimk Aug 16 '13 at 22:04
  • @vadimk No, I mean, that's okay. It's the escaped quotes I was asking about, because to pass a backslash to JS, you need to escape the backslash. I edited my answer and here's a [fiddle](http://jsfiddle.net/Kx5jP/2/) demonstrating the replace on a string a bit more complex. – Jerry Aug 17 '13 at 06:12
  • Hi Jerry. Thank you very much for your help. This code works fine for me. I really appreciate your help! – vadimk Aug 17 '13 at 21:20
1

Using a replacement callback you can basically nest one replacement inside another:

str = str.replace(/"[^"\\]*(?:\\.[^"\\]*)*"/g, function(m) {
          return m[0].replace(/~::~/g, "");
      });

The first pattern matches a double-quoted string that allows for escaped quotes (and escaped anything, really), in the form of an unrolling-the-loop pattern.

The callback function gets an array with the entire match at index 0 and captured subgroups at subsequent indices (not relevant in your case). We take that entire match, remove all ~::~ from it, and return it.

Alternatively, if your quotes are always matched, then the ~::~ you want to remove are always followed by an odd number of ":

str = str.replace(/~::~(?=[^"\\]*(?:\\.[^"\\]*)*"[^"]*(?:"[^"\\]*(?:\\.[^"\\]*)*"[^"]*)*$)/g, "");

It looks horrible, but essentially, it uses the same trick as the pattern above to account for escaping. Then it makes sure to only match exactly one " followed by exactly an even number of " (and arbitrarily many other characters).

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • Thank you very much m.buetter. Your pattern works fine for the example I provided. But I mentioned in my comment above, I forgot, that escaped quote also possible. I apologize for that. – vadimk Aug 16 '13 at 19:45
  • @vadimk edited the first solution. the second one needs some more time for escaping. – Martin Ender Aug 16 '13 at 20:11
  • @vadimk fixed the second one as well – Martin Ender Aug 16 '13 at 20:53
  • I tested your second solution with 'aaa "bbb" "ccc ~::~ ccc" "ddd" ~::~ "eee" '. It removed all ~::~, I need to keep those, which are not in quoted string. Please let me know if I missed anything. And thank you for the "function" example. I never seen regexp, developed this way! – vadimk Aug 16 '13 at 22:44
  • @vadimk sorry, I forgot to anchor the lookahead to the end of the string. I'd recommend the first version though, as it's much more readable and therefore maintainable than the second one. – Martin Ender Aug 17 '13 at 09:02
  • Hi m.buettner. Thank you very much for your help. I desided to pickup another solution, but your "function" style example helped me to look at regex development under another angle. I think I'll apply it in other part of my project. Again, thank you very much for your help! – vadimk Aug 17 '13 at 21:16
1

Description

Instead of capturing the individual quoted substrings like in your example, why not do this in one operation, where the offending strings are just replaced while ignoring the other ones.

These expressions will:

  • ignore escaped quotes like "some \"text is quoted\" in here"
  • find the desired ~::~ which are either inside or outside the quoted sections which is match is determined by the specific expression.
  • assumes the input string already has properly balanced quotes

Note the only difference is with the positive or negative lookahead

Regex: ~::~(?!(?:(?:\\"|[^\\"])*(?:"(?:\\"|[^"])*){2})*$) This finds the ~::~ which are in side quoted strings

Regex: ~::~(?=(?:(?:\\"|[^\\"])*(?:"(?:\\"|[^"])*){2})*$) This finds the ~::~ which are out side quoted strings, included here for extra credit but not demonstrated below.

Replace with: empty string

enter image description here

Example

Live Demo In the example, you're interested in the "input.replace()" field which shows the output.

Sample Text

~::~ aaa "bbb" "ccc ~::~ cc\"c ~::~ ccc" "ddd" ~::~ "eee" ~::~

After Replacement

~::~ aaa "bbb" "ccc cc\"c ccc" "ddd" ~::~ "eee" ~::~


Or

If you realy want to just capture the quoted strings while ignoring the escaped quotes then:

"(?:\\"|[^"])*"

enter image description here

Example

Sample Text

~::~ aaa "bbb" "ccc ~::~ cc\"c ~::~ ccc" "ddd" ~::~ "eee" ~::~

Matches

[0] => "bbb"
[1] => "ccc ~::~ c\"cc ~::~ ccc"
[2] => "ddd"
[3] => "eee"
animuson
  • 53,861
  • 28
  • 137
  • 147
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
  • 1
    Hi Denomales. Thank you very much for your answer. I was impressed with clean and detailed explanation and examples. For now I picked up another solution, but thinking if I can apply this style to my other project. Again, thank you so much! – vadimk Aug 17 '13 at 21:12