Replace text inside bracket with an anchor link

Question

I currently have a body of text as such

text = "hello this [is a cool] line of text that might have [two] brackets.

What I need is to parse, and replace this text, so in this example it would end up like

text = "hello this <a href='/phrase/is a cool/'>is a cool</a> line of text that might have <a href='/phrase/two/'>two</a> brackets.

Now I think in regex to find everything brackets is \[.*?\], but I'm unsure how to do this specifically.

Does this help? https://stackoverflow.com/questions/11096720/replace-a-string-located-between — Milos Stojanovic, Oct 09 '21 at 21:13

score 1 · Answer 1 · answered Oct 09 '21 at 21:19

You can do this by following

Get all substrings enclosed by [ and ]
Replace the content with appropriate text

>>> import re
>>> txt = "hello this [is a cool] line of text that might have [two] brackets."
>>> phrases = re.findall(r"(\[.+?\])", txt)
>>> for phrase in phrases:
...     txt = txt.replace(phrase, "<a href='/phrase/{}/'>{}</a>".format(phrase[1:-1], phrase[1:-1]))
... 
>>> txt
"hello this <a href='/phrase/is a cool/'>is a cool</a> line of text that might have <a href='/phrase/two/'>two</a> brackets."
>>>

Matiiss · Accepted Answer · 2021-10-09T21:43:38.650

1

You can do it like this:

import re

text = "hello this [is a cool] line of text that might have [two] brackets."

brackets = re.compile(r'\[(.*?)\]')
new_text = brackets.sub(lambda x: f'<a href=/phrases/{x.group(1)}>{x.group(1)}</a>', text)

print(new_text)

This will replace the pattern with what the lambda returns:
x.group(1) returns the first group in the regex pattern (indexing starts from 1): (.*?), meaning it will return only the text in between brackets and then format it using f strings.

To also remove any punctuation from the text in the brackets this code could be used (notice how the end result doesn't have any of the . that were in between the brackets):

import re
import string

text = "hello this [is a..... cool] line of text that might have [two] brackets."


def replace_with_link(match):
    info = match.group(1)
    info = info.translate(str.maketrans('', '', string.punctuation))
    return f'<a href="/phrases/{info}">{info}</a>'


brackets = re.compile(r'\[(.*?)\]')
new_text = brackets.sub(replace_with_link, text)

print(new_text)

edited Oct 09 '21 at 21:43

answered Oct 09 '21 at 21:22

Matiiss

5,970
2
12
29

Thank you, I suppose if inside the brackets they have special charectors like commas or periods, this will leave them in or strip them? and if so, any way to strip them? – nadermx Oct 09 '21 at 21:26
1

@nadermx well `.` matches any character (except newline) and doesn't really exclude anything, I will edit to add how to remove all the punctuation if that is what you were looking for – Matiiss Oct 09 '21 at 21:30
1

@nadermx I had added the code for punctuation, I am pretty sure that I wrote a comment saying that but now I don't see it so just in case I wrote this one – Matiiss Oct 10 '21 at 12:56

Replace text inside bracket with an anchor link

2 Answers2