-2

How can I replace occurrences of white-space with underscores, but only in a specific part of the line/string, using python?

source string:

* this is the line with a [[filename01 as a wiki type link]] inside

I'd like to convert to:

* this is the line with a [[filename01_as_a_wiki_type_link]] inside

In the end came up with this:

res = re.sub(r'(?:.*)?(?<=\[\[[a-zA-Z0-9]+)\s', r'_', line)

And this is what I expected it to do:

(?:.*)? => re.sub don't do anything at all with this, no matter if it's in the line or not

(?<=\[\[[a-zA-Z0-9]+)\s+ => re.sub do something with the whitespace(s) "\s" that comes after alphanumerical characters after the (escaped) square brackets.

And this gives me: "look-behind requires fixed-width pattern" meaning that I have to know the length of the string inside the square brackets which I don't.

My question is now: what is the right approach for this? I want to replace whitespaces by underscores but only inside a square-brackets-enclosed string of random length.

lievendp
  • 9
  • 5
  • 1
    As "sub" accepts a function as replacement you could simply match the whole double-bracketed string(s) (including brackets) and apply a simple "str.replace" on it in the replacement function. – Michael Butscher Sep 02 '20 at 15:06
  • 2
    Does this answer your question? [Python remove spaces in between braces](https://stackoverflow.com/questions/62051197/python-remove-spaces-in-between-braces), [Regex to remove spaces between '\[' and '\]'](https://stackoverflow.com/q/16644159/8967612) – 41686d6564 stands w. Palestine Sep 02 '20 at 15:07
  • @MichaelButscher: very useful, completely forgot abt that one, thanks! it works fine. – lievendp Sep 02 '20 at 17:01
  • @41686d6564 I've looked into the 2 links you refer, the first is python but only takes into account whitespaces right next to the opening and closing brackets and not inside the rest of that string which is important to my case as I don't want any whitespaces between the square brackets. The second one is javascript and the syntax confused me a little bit. – lievendp Sep 02 '20 at 18:30

2 Answers2

0

Please try below regex to match the spaces between [[ and ]]

(\s)+(?=[^[]*?\]\])

and replace \1 with _

Code

import re
a="this is the line with a [[filename01 as a wiki type link]] inside"
print(re.sub("(\s)+(?=[^[]*?\]\])","_",a))

Output

this is the line with a [[filename01_as_a_wiki_type_link]] inside

Demo

Regex Demo | Python Demo

Liju
  • 2,273
  • 3
  • 6
  • 21
  • Do I understand this right: The regex is looking for a string of 1 or more whitespaces and if that is found, then look-ahead for a string of 0 length or more that doesn't start with "[" and ends with "]]". If such condition is met, then that whitespace is replaced with "_". I don't seem to grasp why it's (\s)+ instead of (\s+) and also why there's a "?" in the lookahead before the first "\]". Moving the + inside the group don't change anything that I can see and neither does leaving out the "?" in the lookahead. Could you @Liju point out why this is? Thanks! – lievendp Sep 02 '20 at 18:14
  • + sign inside or outside bracket wont make any difference. In case you want to replace double space with double underscore, remove + sign. ? Mark next to * is to make the * ungreedy. Check [this link](https://docs.python.org/3/library/re.html) to learn more about greedy qualifiers. – Liju Sep 02 '20 at 18:38
  • thank you @Liju this really helped a lot. I'd upvote if I could. :-) – lievendp Sep 02 '20 at 19:15
0
import re

input_string = ('* this is the line with a '
                '[[filename01 as a wiki type link]] inside')
start, end = re.search('\[\[.*\]\]', input_string).span()
print(input_string[:start] +
      input_string[start:end].replace(' ', '_') +
      input_string[end:])
eugenesqr
  • 589
  • 6
  • 19