-1

I need to find all website addresses in the input text and print all addresses in the order they appear in the text, each on a new line. "https: //" "http: //" "www."

I used split in the string, but I can't return that start with this 'www'. Can someone explain to me how can I solve this?

Sample Input 1:

WWW.GOOGLE.COM uses 100-percent renewable energy sources and www.ecosia.com plants a tree for every 45 searches!

Sample Output 1:

WWW.GOOGLE.COM

www.ecosia.com

text = input()
text = text.lower()
words = text.split(" ")
for word in words:
ewokx
  • 2,204
  • 3
  • 14
  • 27

2 Answers2

0

what i would do is to catch the "www" couse' we know every url beggins with that , and end with an spacebar, so put everything in and array and then print it, but python has a lot of string functions in its library but i don't know many of that.

str = " www.GOOGLE.COM uses 100-percent renewable energy sources and www.ecosia.com plants a tree for every 45 searches! "
str.lower()
tmp = ""
all_url = []
k=0
for i in range(len(str)-3):
    if(str[i]+str[i+1]+str[i+2] == "www"):
        k=i+4
        while(str[k] != " "):
            tmp=tmp+str[k]
            k+=1
        all_url.append(tmp)
        tmp = ""
        i=k
for url in all_url:
    print("www." + url )
0

A better way is to use Regex. You can learn more good regex pattern from this

import re
url_regex = r"(?i)(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})"
raw_string = "WWW.GOOGLE.COM uses 100-percent renewable energy sources and www.ecosia.com plants a tree for every 45 searches!"
urls = re.findall(url_regex, raw_string)
HoangYell
  • 4,100
  • 37
  • 31