0

I have a list of strings with companies websites.

This is an example: ['www.apple.com/about', 'go-sharp.ai/services', 'http.titos.com.br']

I need to replace them with homepage.

The result must be: ['www.apple.com','go-sharp.ai','http.titos.com.br']

Could you suggest the best way to do it, please (may be some API).

Thank you for your time!

audiotec
  • 121
  • 1
  • 10
  • 1
    Hi, [urllib](https://docs.python.org/3/library/urllib.parse.html) has lot of facilities to handle urls. – Glauco Nov 09 '21 at 13:15

2 Answers2

1

One Approach: using split method

array=['www.apple.com/about', 'go-sharp.ai/services', 'http.titos.com.br']
result=[]
for ar in array:
    result.append(ar.split("/")[0])
print(result)

Output: ['www.apple.com', 'go-sharp.ai', 'http.titos.com.br']

Mani
  • 280
  • 1
  • 10
1

With your example, you can esasily make a simple parser like this:

sites = ['www.apple.com/about', 'go-sharp.ai/services', 'http.titos.com.br']
for s in sites:
    print(s.split('/')[0])

as said @Be Chiller Too, you can also use urllib.parse.urlparse, but make sure your websites are well formatted, i.e. as says docs:

Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.

cf. https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse

Kaz
  • 1,047
  • 8
  • 18