2

I'm implementing a search in my website, and would like to support searching for exact phrases. I want to end up with an array of terms to search for; here are some examples:

"foobar \"your mom\" bar foo" => ["foobar", "your mom", "bar", "foo"]

"ruby rails'test course''test lesson'asdf" => ["ruby", "rails", "test course", "test lesson", "asdf"]

Notice that there doesn't necessarily have to be a space before or after the quotes.

I'm not well versed in regular expressions, and it seems unnecessary to try to split it repeatedly on single characters. Can anybody help me out? Thanks.'

davidcelis
  • 3,277
  • 1
  • 19
  • 16
  • You changed the problem statement and now my answer is incomplete. Before I update my answer, are you sure about this one then? Also, can you escape quotes in your strings? – polygenelubricants Jul 23 '10 at 18:15

1 Answers1

2

You want to use this regular expression (see on rubular.com):

/"[^"]*"|'[^']*'|[^"'\s]+/

This regex matches the tokens instead of the delimiters, so you'd want to use scan instead of split.

The […] construct is called a character class. [^"] is "anything but the double quote".

There are essentially 3 alternates:

  • "[^"]*" - double quoted token (may include spaces and single quotes)
  • '[^']*' - single quoted token (may include spaces and double quotes)
  • [^"'\s]+ - a token consisting of one or more of anything but quotes and whitespaces

References


Snippet

Here's a Ruby implementation:

s = %_foobar "your mom"bar'test course''test lesson'asdf_
puts s

puts s.scan(/"[^"]*"|'[^']*'|[^"'\s]+/)

The above prints (as seen on ideone.com):

foobar "your mom"bar'test course''test lesson'asdf
foobar
"your mom"
bar
'test course'
'test lesson'
asdf

See also

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • @davidcelis: you want to use this with `scan`, not `split`. I'll revise the answer shortly (since you also changed the problem statement). – polygenelubricants Jul 23 '10 at 18:17
  • Sorry about that. I realized that my second example may have not been clear enough. Thank you for helping! – davidcelis Jul 23 '10 at 18:18
  • Just gave that regex a shot with scan, and changing it to `/['"][^\['"\]]*['"]|[^\['"\]]+/` to allow single quotes as well; as far as I can tell, it's working beautifully. – davidcelis Jul 23 '10 at 18:23
  • @davidcelis: see my latest revision; tell me if there's anything else I can do. Also, please upvote if my answer is useful. – polygenelubricants Jul 23 '10 at 18:28