4

I've been looking around for a while to find the answer but unfortunately no luck today.

I'm trying to get the contents off this webpage and save it to a variable. (Link to the website)

Any help would be great, if you are confused what I am meaning drop a question in the comments, because personally I'm not sure how to word this well.

Please don't let me down stackoverflow, I know you can do it ;)

  • I don't get what you need... If you want just a dump of the site, you can curl it: `curl http://rivalregions.com/rss/all > site` from terminal will store all the site in a file called `site`. – ingroxd Nov 09 '17 at 23:32
  • That just gives me invalid syntax. –  Nov 09 '17 at 23:36
  • Which OS do you run? In windows you have to manually download and install it. In Linux, depending on your distro, you may need to install it first either. – ingroxd Nov 09 '17 at 23:44
  • Right at this moment in time I'm using kali linux. But I usually use Windows 10. –  Nov 09 '17 at 23:49
  • `apt install curl`, then `curl http://rivalregions.com/rss/all > file` – ingroxd Nov 09 '17 at 23:51
  • `code`Reading package lists... Done Building dependency tree Reading state information... Done curl is already the newest version (7.55.1-1)`code`. It is already installed..... –  Nov 09 '17 at 23:53
  • @NathanWatson He is suggesting that you do the task on the Linux command line, which does belong more [on Unix&Linux](https://unix.stackexchange.com/) than here. – bgse Nov 09 '17 at 23:59
  • Look at [the documentation](https://docs.python.org/3/howto/urllib2.html#fetching-urls), it is in there. – bgse Nov 10 '17 at 00:04
  • Possible duplicate of [How can I read the contents of an URL with Python?](https://stackoverflow.com/questions/15138614/how-can-i-read-the-contents-of-an-url-with-python) – bgse Nov 10 '17 at 00:05
  • @NathanWatson Please mark my answer as correct if it worked for you. – alexisdevarennes Nov 15 '17 at 01:18

2 Answers2

2

You'll want to install requests. Hopefully you know pip otherwise please read up on it and install it.

pip install requests

then in your code:

import requests

url = "http://rivalregions.com/rss/all"

req = requests.get(url)

if req.status_code in [200]:
    html = req.text
else:
    print 'Could not retrieve: %s, err: %s - status code: %s' % (url, req.text, req.status_code)
    html = None
alexisdevarennes
  • 5,437
  • 4
  • 24
  • 38
1

You don't need to install requests to get this to work in python3. The below code was tested using python 3.6.10

import urllib.request

def print_some_url():
    try:
        with urllib.request.urlopen('http://www.python.org/') as f:
           a_variable = f.read().decode('utf-8')
           print(a_variable)
    except urllib.error.URLError as e:
       print(e.reason)

print_some_url()   
Freddie
  • 908
  • 1
  • 12
  • 24