5

Regex Dialect: JavaScript

I have the following capture group (('|").*?[^\\\2]\2) that selects a quoted string excluding escaped quotes.

Matches these for example...

"Felix's pet"
'Felix\'s pet'

However I would now like to remove all whitespace from a string except anything matching this pattern. Is there perhaps a way to back reference the capture group \1 and then exclude it from the matches?

I have attempted to do so with my limited RegEx knowledge, but so far it I can only select the space immediately preceding or following the pattern.

I have saved my test script on regexr for convenience if you would like to play around with my example.

Intended results:

key : string becomes key:string

dragon : "Felix's pet" becomes dragon:"Felix's pet"

"Hello World" something here "Another String"

becomes

"Hello World"somethinghere"Another String"

etc...

SnareChops
  • 13,175
  • 9
  • 69
  • 91
  • @anubhava: I disagree with the dupe vote - the accepted answer uses a strategy that only works with the special string structure in that question, and your (better) answer fails with escaped quotes. Voting to reopen. – Tim Pietzcker Dec 06 '15 at 10:53
  • @TimPietzcker: Fair enough, I am sure there is a better dup but I just couldn't find using my search. – anubhava Dec 06 '15 at 10:54
  • 2
    @SnareChops: Your regex is trying to take escaped quotes into account, but it does so incorrectly (your character class is wrong, and even if it worked, you should consider the case `'foo \\'` where there is a backslash before the closing quote, but it's not an escaping backslash. Would you need to handle such a case? – Tim Pietzcker Dec 06 '15 at 10:56

2 Answers2

2

This is extremely hard to do with regular expressions. The following works:

result = subject.replace(/ (?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)/g, "");

I've built this answer from one of my earlier answers to a similar, but not identical question; therefore I'll refer you to it for an explanation.

You can test it live on regex101.com.

Community
  • 1
  • 1
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
0

In Javascript, you can use String.replace with function as parameter. So you define matching groups and then you can replace each of them separately.

You want match all white spaces

\s+

and you need match all inside quotes

(('|")(?:[^\\]\\\2|.)*?\2)

so you combine it together

var pattern = /\s+|(('|")(?:[^\\]\\\2|.)*?\2)/g

and you write replace statement with anonymous function as parameter:

var filteredString = notFilteredString.replace(pattern,
        function(match, group1) { return group1 || "" })

With each match the function is called to give replace string. The regexp match either white space or content of quote. The content of quote is wrapped as group1 and the anonymous function returns group1 if group1 is matched or nothing "" for white spaces or any other match.

Rudolf Gröhling
  • 4,611
  • 4
  • 27
  • 37