Ruby .scan method returns empty using regex

Question

So given a string like this "\"turkey AND ham\" NOT \"roast beef\"" I need to get an array with the inner strings like so: ["turkey AND ham", "roast beef"] and eliminate OR's, AND's and NOT's that may or may not be there.

With the help of Rubular I came up with this regex /\\["']([^"']*)\\["']/

which returns the following 2 groups:

Match 1 1. turkey AND ham Match 2 1. roast beef

however when I use it with .scan keep getting and empty array.

I looked at this and this other SO posts, and a few others, but can not figure out where I am going wrong

Here is the result from my rails console:

=> q = "\"turkey and ham\" OR \"roast beef\"" => q.scan(/\\["']([^"']*)\\["']/) => []

Expectation: ["turkey AND ham", "roast beef"]

I shall also mention I suck at regex.

You seem to overescape the pattern. Use `q.scan(/["']([^"']*)["']/)`. With double backslashes, you defined a literal backslash, and there is no backslash in the string returning no matches. — Wiktor Stribiżew, Oct 13 '16 at 17:30
to expand on what @WiktorStribiżew stated your actual string is `'"turkey AND ham" NOT "roast beef"'` the `\` are to escape the double quotes for output and the regex he posted will perform correctly [Example](http://rubular.com/r/kW2pP3zjum) — engineersmnky, Oct 13 '16 at 17:37

Cary Swoveland · Answer 1 · 2016-10-14T05:12:55.207

When the regex used with scan contains a capture group (@davidhu2000's approach), one generally can use lookarounds¹ instead. It's just a matter of personal preference. To allow for double-quoted strings that contain either single- or (escaped) double-quoted strings, you could use the following regex.

r = /
    (?<=") # match a double quote in a positive lookbehind
    [^"]+  # match one or more characters that are not double-quotes
    (?=")  # match a double quote in a positive lookahead
    |      # or
    (?<=') # match a single quote in a positive lookbehind
    [^']+  # match one or more characters that are not single-quotes
    (?=')  # match a single quote in a positive lookahead
    /x    # free-spacing regex definition mode

"\"turkey AND ham\" NOT 'roast beef'".scan(r)
  #=> ["turkey AND ham", "roast beef"]

As '"turkey AND ham" NOT "roast beef"' #=> "\"turkey AND ham\" NOT \"roast beef\"" (i.e., how the single-quoted string is saved), we need not be concerned about that being an additional case to deal with.

^{1 For any in the audience who still consider regular expressions to be black magic, there are four kinds of lookarounds (positive and negative lookbehinds and lookaheads) as elaborated in the doc for Regexp. Sometimes they are regarded as "zero-width" matches as they are not part of the matched text.}

Elegant solution without the need to flatten any array. Thought still considering regex _a sort of black magic_ :) — Jax, Oct 14 '16 at 08:31

score 2 · Accepted Answer · answered Oct 13 '16 at 17:37

2

You regex is trying to match \, which won't match anything in the string, since the \ existed to escape the double quote, and won't be part of the string.

So if you remove \\ in your regex

res = q.scan(/["']([^"']*)["']/)

This will return a 2d array

res = [["turkey and ham"], ["roast beef"]]

Each inner array is all the matching groups from the regex, so if you have two capture groups in your regex, you will see two items in the inner array.

If you want a simple array, you can run flatten method on the array.

answered Oct 13 '16 at 17:37

davidhu

9,523
6
32
53

Now, the only issue OP has is to allow matching `'some "string"'` and `"some 'string'"` – Wiktor Stribiżew Oct 13 '16 at 17:41
@Wiktor, since `'some "string"' #=> "some \"string\"" `, I don't think that one needs attention, but yes on the other. – Cary Swoveland Oct 13 '16 at 19:03
@CarySwoveland: I do not think this sample text has much to do with Ruby. – Wiktor Stribiżew Oct 13 '16 at 19:07
@Wiktor, I don't follow, but the regex works fine when the string is single-quoted: `'"turkey AND ham" OR "roast beef"'.scan(/["']([^"']*)["']/) #=> [["turkey AND ham"], ["roast beef"]]`. – Cary Swoveland Oct 13 '16 at 19:18
I don't know if it's a problem, but `"\"hello'".scan(/["']([^"']*)["']/) => [["hello"]]`. – Cary Swoveland Oct 13 '16 at 19:34

Ruby .scan method returns empty using regex

2 Answers2

Linked