1

I am trying to find a substring which is a basically a link to any website. The idea is that if a user posts something, the link will be extracted and assigned to a variable called web_link. My current code is following:

post = ("You should watch this video https://www.example.com if you have free time!")
web_link = post[post.find("http" or "www"):post.find(" ", post.find("http" or "www"))]

The code works perfectly if there is a spacebar after the link, however, if the link inside the post is at the very end. For example:

post = ("You should definitely watch this video https://www.example.com")

Then the post.find(" ") can not find a spacebar/whitespace and returns -1 which results in web_link "https://www.example.co"

I am trying to find a solution that does not involve an if statement if possible.

Alperen
  • 3,772
  • 3
  • 27
  • 49
Milos
  • 351
  • 1
  • 2
  • 11
  • Side comment: `if` isn't a function. – Neo Oct 13 '17 at 11:27
  • you should be using regex otherwise your function won't be very robust... a simple "python extract url from string" google search would solve your problem – ifma Oct 13 '17 at 11:30

2 Answers2

0

The reason this doesn't work is because if the string isn't found and -1 is returned the slice commands interprets this as "the rest of the string -1 character from the end".

As ifma pointed out the best way to achieve this would be with a regular expression. Something like:

re.search("(https?://|www[^\s]+)", post).group(0)
Chris Edgington
  • 2,937
  • 5
  • 23
  • 42
  • This doesn't include web links starting "www". Actually, if you use a string without "https", you'll get this error: `AttributeError: 'NoneType' object has no attribute 'group'` – Alperen Oct 13 '17 at 14:17
  • Yes fair enough. Updated to take that into account. – Chris Edgington Oct 13 '17 at 15:00
0

Use regex. I've made a little change the solution here.

import re

def func(post):
    return re.search("[(http|ftp|https)://]*([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?", post).group(0)

print(func("You should watch this video www.example.com if you have free time!"))
print(func("You should watch this video https://www.example.com"))

Output:

www.example.com
https://www.example.com

But I should say, using "if" is simpler and obvious:

def func(post):
    start = post.find("http" or "www")
    finish = post.find(" ", start)
    return post[start:] if finish == -1 else post[start:finish]
Alperen
  • 3,772
  • 3
  • 27
  • 49