4

I'm trying to write a regex to replace all spaces that are not included in quotes so something like this:

a = 4, b = 2, c = "space here"

would return this:

a=4,b=2,c="space here"

I spent some time searching this site and I found a similar q/a ( Split a string by spaces -- preserving quoted substrings -- in Python ) that would replace all the spaces inside quotes with a token that could be re-substituted in after wiping all the other spaces...but I was hoping there was a cleaner way of doing it.

Community
  • 1
  • 1

4 Answers4

8

It's worth noting that any regular expression solution will fail in cases like the following:

a = 4, b = 2, c = "space" here"

While it is true that you could construct a regexp to handle the three-quote case specifically, you cannot solve the problem in the general sense. This is a mathematically provable limitation of simple DFAs, of which regexps are a direct representation. To perform any serious brace/quote matching, you will need the more powerful pushdown automaton, usually in the form of a text parser library (ANTLR, Bison, Parsec).

With that said, it sounds like regular expressions should be sufficient for your needs. Just be aware of the limitations.

Daniel Spiewak
  • 54,515
  • 14
  • 108
  • 120
5

This seems to work:

result = string.gsub(/( |(".*?"))/, "\\2")
Borgar
  • 37,817
  • 5
  • 41
  • 42
  • if you get into single- and double-quoted strings, you need to match opening and closing quote marks – Gene T Oct 16 '08 at 09:05
2

I consider this very clean:

mystring.scan(/((".*?")|([^ ]))/).map { |x| x[0] }.join

I doubt gsub could do any better (assuming you want a pure regex approach).

Rômulo Ceccon
  • 10,081
  • 5
  • 39
  • 47
0

try this one, string in single/double quoter is also matched (so you need to filter them, if you only need space):

/( |("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/
Senmiao Liu
  • 173
  • 1
  • 7