Why doesn't Urllib parse the query values of my link?

Question

I am using urllib.parse and I need to take one query value from a URL. I tried it with one URL and it worked, but parse.qs on the other URL returns an empty dictionary

Code that doesn't work:

from urllib.parse import urlparse
from urllib.parse import parse_qs

url = 'https://newassets.hcaptcha.com/captcha/v1/000919d/static/hcaptcha.html#frame=checkbox&id=0d4abkdnbvpa&host=2captcha.com&sentry=true&reportapi=https%3A%2F%2Faccounts.hcaptcha.com&recaptchacompat=off&custom=false&hl=ru&tplinks=on&sitekey=41b778e7-8f20-45cc-a804-1f1ebb45c579&theme=light&origin=https%3A%2F%2F2captcha.com'

parsed_url = urlparse(url)
captured_value = parse_qs(parsed_url.query)

print(captured_value)

Output:

{}

The code that works:

from urllib.parse import urlparse
from urllib.parse import parse_qs

url = 'http://foo.appspot.com/abc?def=ghi'

parsed_url = urlparse(url)
captured_value = parse_qs(parsed_url.query)

print(captured_value)

Output:

{'def': ['ghi']}

bigkeefer · Accepted Answer · 2023-02-13T11:20:23.823

0

In your first URL there is no question mark, so it's not actually a query string. (So you will get an empty dict back from parse_qs.)

Instead the arguments are being passed to a URL fragment (for hcaptcha.html), as they are following the # in the URL. This is for use by JavaScript in the browser (in this case hcaptcha.html will have JS that handles these arguments to affect behaviour on that page) and is not sent to the backend/server.

edited Feb 13 '23 at 11:20

answered Feb 13 '23 at 11:05

bigkeefer

576
1
6
13

Thank you. I changed the '#' to '?' and now everything works. Since I don't need the URL itself, but only the 'sitekey' value from it, this is the solution for me. – Eugene Feb 13 '23 at 12:28

Why doesn't Urllib parse the query values of my link?

1 Answers1