Currently I am trying to get proper URLs from a string containing both proper and improper URLs using Regular Expressions. Result of the code should give a list of the proper URLs from the input string. The problem is I cannot get rid of the "http://example{.com"
, because all I came up with is getting to the "{"
character and getting "http://example"
in results.
The code I am checking is below:
import re
text = "https://example{.com http://example.com http://example.hgg.com/da.php?=id42 http\\:example.com http//: example.com"
print(re.findall('http[s]?[://](?:[a-zA-Z0-9$-_@.&+])+', text))
So is there a good way to get all the matches but excluding matches containing bad characters (like "{"
)?