1

I am using urllib.parse and I need to take one query value from a URL. I tried it with one URL and it worked, but parse.qs on the other URL returns an empty dictionary

Code that doesn't work:

from urllib.parse import urlparse
from urllib.parse import parse_qs

url = 'https://newassets.hcaptcha.com/captcha/v1/000919d/static/hcaptcha.html#frame=checkbox&id=0d4abkdnbvpa&host=2captcha.com&sentry=true&reportapi=https%3A%2F%2Faccounts.hcaptcha.com&recaptchacompat=off&custom=false&hl=ru&tplinks=on&sitekey=41b778e7-8f20-45cc-a804-1f1ebb45c579&theme=light&origin=https%3A%2F%2F2captcha.com'

parsed_url = urlparse(url)
captured_value = parse_qs(parsed_url.query)

print(captured_value)

Output:

{}

The code that works:

from urllib.parse import urlparse
from urllib.parse import parse_qs

url = 'http://foo.appspot.com/abc?def=ghi'

parsed_url = urlparse(url)
captured_value = parse_qs(parsed_url.query)

print(captured_value)

Output:

{'def': ['ghi']}
Eugene
  • 43
  • 5

1 Answers1

0

In your first URL there is no question mark, so it's not actually a query string. (So you will get an empty dict back from parse_qs.)

Instead the arguments are being passed to a URL fragment (for hcaptcha.html), as they are following the # in the URL. This is for use by JavaScript in the browser (in this case hcaptcha.html will have JS that handles these arguments to affect behaviour on that page) and is not sent to the backend/server.

bigkeefer
  • 576
  • 1
  • 6
  • 13
  • Thank you. I changed the '#' to '?' and now everything works. Since I don't need the URL itself, but only the 'sitekey' value from it, this is the solution for me. – Eugene Feb 13 '23 at 12:28