0

For this example :

input_text = "This is a sentence. [balise]Special Word.[/balise] Another sentence here."

I want to split this string with [balise] as start delimiter & [/balise] as end delimiter, but also i want to keep (if possible) the delimiters with the word between them. I am trying to get the following output :

output_text = ["This is a sentence. " , "[balise]Special Word[/balise]" , "Another sentence here."]

I tried many ways (with regex) without sucess, and didn't found any solutions online. How can i do this operation ?

TacoScript
  • 23
  • 5

1 Answers1

0

Using re.findall

>>> re.findall(r"(?:(?!\[balise\]).)+|\[balise\].*?\[/balise\]", "This is a sentence. [balise]Special Word.[/balise] Another sentence here.")
['This is a sentence. ', '[balise]Special Word.[/balise]', ' Another sentence here.']

Some explanation:

  • (?!\[balise\]) means "not followed by \[balise\]"
  • (?:(?!\[balise\]).)+ means "any character until \[balise\]"
  • .*? means "zero or more characters (.*) but in lazy mode"

Using re.split

>>> re.split(r"(\[balise\].*?\[/balise\])", "This is a sentence. [balise]Special Word.[/balise] Another sentence here.")
['This is a sentence. ', '[balise]Special Word.[/balise]', ' Another sentence here.']
logi-kal
  • 7,107
  • 6
  • 31
  • 43