The code below queries the Wikipedia API for pages in the "Physics" category and converts the response into a Python dictionary.
import ast
import requests
url = "https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Physics&cmlimit=500&cmcontinue="
response = requests.get(url)
text = response.text
dict = ast.literal_eval(sourceCode)
Here is one of the results returned by the Wikipedia API:
{
"pageid": 50724262,
"ns": 0,
"title": "Blasius\u2013Chaplygin formula"
},
The Wikipedia page that "Blasius\u2013Chaplygin formula"
corresponds to is https://en.wikipedia.org/wiki/Blasius–Chaplygin_formula.
I want to use the "title" to download pages from Wikipedia. I've replaced all spaces with underscores. But it's failing. I'm doing:
import requests
url = "https://en.wikipedia.org/wiki/Blasius\u2013Chaplygin_formula"
response = requests.get(url)
This gives me:
requests.exceptions.HTTPError: 404 Client Error:
Not Found for url: https://en.wikipedia.org/wiki/Blasius%5Cu2013Chaplygin_formula
How do I change the title Blasius\u2013Chaplygin formula
into a URL that can be successfully called by requests
?
When I tried to insert the Wikipedia link into this question on Stack Overflow, Stack Overflow automatically converted it to https://en.wikipedia.org/wiki/Blasius%E2%80%93Chaplygin_formula.
When I did:
import requests
url = "https://en.wikipedia.org/wiki/Blasius%E2%80%93Chaplygin_formula"
response = requests.get(url)
it was successful, so I want a library that will do a conversion like this that I can use in Python.