-1

I have this simple code:

import requests
r = requests.get('https://yahoo.com')
print(r.url)

Which after executing, prints:

https://uk.yahoo.com/?p=us

I want to see:

  1. How many redirects have happened before arriving on https://uk.yahoo.com/?p=us (clearly, there is redirect as I typed https://yahoo.com originally)?

  2. I also want to save the content of each page, not only the last one. How to do this?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
user9371654
  • 2,160
  • 16
  • 45
  • 78
  • You want `requests` to *not* automatically follow the redirects, so you can see each page in the chain. Then you can keep manually following them until you get to the final result. – jonrsharpe Feb 28 '19 at 08:18

1 Answers1

6

Use response.history. From the documentation...

The Response.history list contains the Response objects that were created in order to complete the request. The list is sorted from the oldest to the most recent response.

So, to get the number of intermediate URLs, you could do something like:

response = requests.get(url)
print(len(response.history))

And to get what those URLs actually were and what their responses contain, you could do:

for resp in response.history:
    print(resp.url, resp.text)

If needed, you can also submit a new request to the intermediate URLs with the optional parameter allow_redirects set to False:

r = requests.get(resp.url, allow_redirects=False)

JoshG
  • 6,472
  • 2
  • 38
  • 61
  • 1
    Note that you'd have to not follow the redirects when making the intermediate URL requests. – jonrsharpe Feb 28 '19 at 08:27
  • @AndroidNoobie Why is it necessary to use urlib? isn't there a way to get the content (I mean the page content) using requests? – user9371654 Feb 28 '19 at 08:31
  • Yes, `r = requests.get(url, allow_redirects=False)`. I'll update my answer, even though this is marked as a duplicate. – JoshG Feb 28 '19 at 08:33
  • To get the intermiediate pages content, Instead of `r = requests.get(resp.url, allow_redirects=False)` for each history item `resp`, can't I use `resp.text` as each history item is already a response object? – user9371654 Mar 05 '19 at 15:05
  • I think no need to submit new request for each history item. They are already response object. Just extract the data from them. So your answer should be `for resp in response.history` you d=just do `print(resp.text)`. No need to do new get.request per url. Plz update the answer or correct me. – user9371654 Mar 05 '19 at 16:39
  • Answer updated. – JoshG Mar 05 '19 at 18:36