Prefix "http://" valid but actually ""https://"

Question

A long list of incomplete websites, some missing prefix like "http://www." etc.

pewresearch.org
narod.ru
intel.com
xda-developers.com
oecd.org

I tried:

import requests
from lxml.html import fromstring

to_check = [
"pewresearch.org",
"narod.ru",
"intel.com",
"xda-developers.com",
"oecd.org"]

for each in to_check:
    r = requests.get("http://www." + each)
    tree = fromstring(r.content)
    title = tree.findtext('.//title')
    print (title)

They returned:

Pew Research Center | Pew Research Center
Лучшие конструкторы сайтов | Народный рейтинг конструкторов для создания сайтов
Intel | Data Center Solutions, IoT, and PC Innovation
XDA Portal & Forums
Home page - OECD

Seems theirs all started with "http://www.", however not - because for example, the right one is "https://www.pewresearch.org/".

What's the quickest way, with online tool or Python, that I can find out their complete and correct addresses, instead of keying them one-by-one in web browser? (some might be http, some https).

John Glenn · Accepted Answer · 2022-01-28T06:19:34.943

0

Write a script / short program to send a HEAD request to each site. The server should respond with a redirect (e.g. to HTTPS). Follow each redirect until no further redirects are received.

The C# HttpClient can follow redirects automatically.

For Python, see @jterrace's answer here using the requests library with the code snippet below:

>>> import requests
>>> r = requests.head('http://github.com', allow_redirects=True)
>>> r
<Response [200]>
>>> r.history
[<Response [301]>]
>>> r.url
u'https://github.com/'

edited Jan 28 '22 at 06:19

answered Jan 28 '22 at 03:02

John Glenn

1,469
8
13

thank you! can you please also advise a way in Python? – Mark K Jan 28 '22 at 05:45
1

Sure, check out this answer for an example in Python: https://stackoverflow.com/a/9967683/5369852 – John Glenn Jan 28 '22 at 05:47
brilliant! can you update your answer with quoted codes, so that we can close this question? – Mark K Jan 28 '22 at 06:14

Prefix "http://" valid but actually ""https://"

1 Answers1