Python: regex matching the opposite of what's desired

Question

I want my regex to find an url to be able to turn it into a html link.The regex will be used on links that look like the following: www.site.extension and https://site.extension. The regex is \S*.?w{3}.\S+.\S+ and it does give the desired result when using https://regexr.com/. When using my python script however, I get the opposite result of what's desired, so everything that isn't a link is seens as if it were, but the links aren't found.

The python code is:

testbestand = """TESTBESTAND

Div1 kjaskdjfiudhgjnkcvdnbk djskj ij g ijg jkdfnbdiiji jj iikdafnbn ojedfkj giqw34
Akdjfkjasdf

Div2 aksjdfkj sadfkjg sdkjiew kvckjeri cdkj sdkeridk erkire

Div3 kajkdjfkjakdjgsdghijskdg

Div 4 www.link.com

Div5
Table Left  Table Right
Table Left 2    Table Right 2
Table Left 3    Table Right 3
"""

fileContent = testbestand
toAddToFile = ""

#find links
pattern = re.compile(r'\S*\.?w{3}\.\S+\.\S+')
matches = re.split(pattern, fileContent)\

for match in matches:
    match = match.strip()

    if len(match) > 0:
        #TODO change to 'edit' file, instead of adding to it
        test = """<a href=" """ + match + """>" """ + match + "</a>"
        print(test)

        toAddToFile += """<a href=" """ + match + """>" """ + match + "</a>"

Thanks in advance for any help! If more info or code is needed, I'll provide it straight away.

If you want to match why do you split? – Wiktor Stribiżew Aug 13 '18 at 07:31 — Wiktor Stribiżew, Aug 13 '18 at 07:31

score 3 · Accepted Answer · answered Aug 13 '18 at 07:32

3

That's because you use re.split, which is designed to split the text at the patterns. Instead, use `re.findall:

pattern = re.compile(r'\S*\.?w{3}\.\S+\.\S+')
matches = pattern.findall(fileContent)

answered Aug 13 '18 at 07:32

L3viathan

26,748
2
58
81

This is perfect thanks! I'll mark it as the correct answer when the timer is done. It seems like it understood re.split wrong. – René Steeman Aug 13 '18 at 07:34
`re.findall` won't help preserve what aren't matched. – blhsing Aug 13 '18 at 07:37
That's good for now, altough I might change the behavior of the code in the future, so I'll keep it in mind. Thanks for pointing it out! – René Steeman Aug 13 '18 at 07:39

score 2 · Answer 2 · answered Aug 13 '18 at 07:36

2

You should use re.sub instead of re.split:

toAddToFile = re.sub(r'(\S*\.?w{3}\.\S+\.\S+)', r'<a href="\1">\1</a>', fileContent)

answered Aug 13 '18 at 07:36

blhsing

91,368
6
71
106

Python: regex matching the opposite of what's desired

2 Answers2