How to save webpage data to a variable in Python

Question

I've been looking around for a while to find the answer but unfortunately no luck today.

I'm trying to get the contents off this webpage and save it to a variable. (Link to the website)

Any help would be great, if you are confused what I am meaning drop a question in the comments, because personally I'm not sure how to word this well.

Please don't let me down stackoverflow, I know you can do it ;)

I don't get what you need... If you want just a dump of the site, you can curl it: `curl http://rivalregions.com/rss/all > site` from terminal will store all the site in a file called `site`. — ingroxd, Nov 09 '17 at 23:32
Which OS do you run? In windows you have to manually download and install it. In Linux, depending on your distro, you may need to install it first either. — ingroxd, Nov 09 '17 at 23:44
Right at this moment in time I'm using kali linux. But I usually use Windows 10. — , Nov 09 '17 at 23:49
`apt install curl`, then `curl http://rivalregions.com/rss/all > file` — ingroxd, Nov 09 '17 at 23:51
`code`Reading package lists... Done Building dependency tree Reading state information... Done curl is already the newest version (7.55.1-1)`code`. It is already installed..... — , Nov 09 '17 at 23:53
@NathanWatson He is suggesting that you do the task on the Linux command line, which does belong more [on Unix&Linux](https://unix.stackexchange.com/) than here. — bgse, Nov 09 '17 at 23:59
Look at [the documentation](https://docs.python.org/3/howto/urllib2.html#fetching-urls), it is in there. — bgse, Nov 10 '17 at 00:04
Possible duplicate of [How can I read the contents of an URL with Python?](https://stackoverflow.com/questions/15138614/how-can-i-read-the-contents-of-an-url-with-python) — bgse, Nov 10 '17 at 00:05
@NathanWatson Please mark my answer as correct if it worked for you. — alexisdevarennes, Nov 15 '17 at 01:18

score 2 · Accepted Answer · answered Nov 10 '17 at 00:16

You'll want to install requests. Hopefully you know pip otherwise please read up on it and install it.

pip install requests

then in your code:

import requests

url = "http://rivalregions.com/rss/all"

req = requests.get(url)

if req.status_code in [200]:
    html = req.text
else:
    print 'Could not retrieve: %s, err: %s - status code: %s' % (url, req.text, req.status_code)
    html = None

score 1 · Answer 2 · answered Aug 13 '20 at 11:57

You don't need to install requests to get this to work in python3. The below code was tested using python 3.6.10

import urllib.request

def print_some_url():
    try:
        with urllib.request.urlopen('http://www.python.org/') as f:
           a_variable = f.read().decode('utf-8')
           print(a_variable)
    except urllib.error.URLError as e:
       print(e.reason)

print_some_url()

How to save webpage data to a variable in Python

2 Answers2