1

I'm trying to convert the following python regex to ruby

match = re.search(r'window.__APOLLO_STATE__ = JSON.parse\("(.+?)"\);', body)

I've done some digging and Regexp#match should be what i'm looking for but the following is returning nil.

resp.body.match('^window.__APOLLO_STATE__ = JSON.parse\("(.+?)"\)')

How can I convert the regex and where am I wrong?

user2954587
  • 4,661
  • 6
  • 43
  • 101

3 Answers3

1

You may use

resp.body[/window\.__APOLLO_STATE__ = JSON\.parse\("(.*?)"\);/, 1]

Here,

  • /.../ is a regex literal notation that is very convenient when defining regex patterns
  • Literal dots are escaped, else, they match any char but line break chars
  • The .+? is changed to .*? to be able to match empty values (else, you may overmatch, it is easier to later discard empty matches than fix overmatches)
  • 1 tells the engine to return the value of the capturing group with ID 2 of the first match. If you need multiple matches, use resp.body.scan(/regex/).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

An idiomatic way is to use the =~ regex match operator:

resp.body =~ /^window.__APOLLO_STATE__ = JSON.parse\("(.+?)"\)/

You can access the capture groups with $1, $2, and so on.

If you don't like the global variable usage, you can also use the Regexp#match method

result = /^window.__APOLLO_STATE__ = JSON.parse\("(.+?)"\)/.match(resp.body)
result[1] # => returns first capture group
Andrew Schwartz
  • 4,440
  • 3
  • 25
  • 58
0

As I understand, your string is something like

str = 'window.__APOLLO_STATE__ = JSON.parse("my dog has fleas");'

and you wish to extract the text between the double quotes. You can do that with the following regular expression, which does not employ a capture group:

r = /\Awindow\.__APOLLO_STATE__ = JSON\.parse\(\"\K.+?(?=\"\);\z)/

str[r]
  #=> "my dog has fleas"

The regular expression can be written in free-spacing mode to make it self-documenting:

r = /
    \A          # match beginning of string
    window\.__APOLLO_STATE__\ =\ JSON\.parse\(\"
                # match substring
    \K          # discard everything matched so far 
    .+?         # match 1+ characters, lazily
    (?=\"\);\z) # match "); followed by end-of-string (positive lookahead)
    /x          # free-spacing regex definition mode

The contents of a positive lookahead must be matched but are not part of the match returned. Neither is the text matched prior to the \K directive part of the match returned.

Free-spacing mode removes all whitespace before the expression is parsed. Accordingly, any intended spaces (in "APOLLO_STATE__ = JSON", for example) must be protected. I've done that by escaping the spaces, one of several ways that can be done.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100