Get HomePage from string of website adress

Question

I have a list of strings with companies websites.

This is an example: ['www.apple.com/about', 'go-sharp.ai/services', 'http.titos.com.br']

I need to replace them with homepage.

The result must be: ['www.apple.com','go-sharp.ai','http.titos.com.br']

Could you suggest the best way to do it, please (may be some API).

Thank you for your time!

Hi, [urllib](https://docs.python.org/3/library/urllib.parse.html) has lot of facilities to handle urls. — Glauco, Nov 09 '21 at 13:15

score 1 · Answer 1 · answered Nov 09 '21 at 13:16

1

One Approach: using split method

array=['www.apple.com/about', 'go-sharp.ai/services', 'http.titos.com.br']
result=[]
for ar in array:
    result.append(ar.split("/")[0])
print(result)

Output: ['www.apple.com', 'go-sharp.ai', 'http.titos.com.br']

answered Nov 09 '21 at 13:16

Mani

280
1
10

score 1 · Answer 2 · answered Nov 09 '21 at 13:20

With your example, you can esasily make a simple parser like this:

sites = ['www.apple.com/about', 'go-sharp.ai/services', 'http.titos.com.br']
for s in sites:
    print(s.split('/')[0])

as said @Be Chiller Too, you can also use urllib.parse.urlparse, but make sure your websites are well formatted, i.e. as says docs:

Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.

cf. https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse

Get HomePage from string of website adress

2 Answers2