How to add newline after every characters like ” .[xxx] “ in a string in python

Question

I have the following string :

It reported the proportion of the edits made from America was 51% for the Wikipedia, and 25% for the simple Wikipedia.[142] The Wikimedia Foundation hopes to increase the number in the Global South to 37% by 2015.[143]

I am trying to replace every characters lik this .[xxx] with .[xxx] \n;

x are digits here

I am taking help from different stalk overflow answers; one such is :

Python insert a line break in a string after character "X"

Regex: match fullstop and one word in python

import re
str = "It reported the proportion of the edits made from America was 51% 
for the Wikipedia, and 25% for the simple Wikipedia.[142] The Wikimedia 
Foundation hopes to increase the number in the Global South to 37% by 
2015.[143] "
x = re.sub("\.\[[0-9]{2,5}\]\s", "\.\[[0-9]{2,5}\]\s\n",str)
print(x)

I expect the following output:

It reported the proportion of the edits made from America was 51% for the Wikipedia, and 25% for the simple Wikipedia.[142]                          
The Wikimedia Foundation hopes to increase the number in the Global South to 37% by 2015.[143]”

But I am getting:

It reported the proportion of the edits made from America was 51% for the Wikipedia, and 25% for the simple Wikipedia\\.\[[0-9]{2,5}\]\s   The Wikimedia Foundation hopes to increase the number in the Global South to 37% by 2015\\.\[[0-9]{2,5}\]\s

Andrej Kesely · Answer 1 · 2019-07-05T08:12:49.710

1

You probably want to use capturing groups and back-referrences in re.sub. You also don't need to escape the replacement string (regex101):

import re
s = '''It reported the proportion of the edits made from America was 51% for the Wikipedia, and 25% for the simple Wikipedia.[142] The Wikimedia Foundation hopes to increase the number in the Global South to 37% by 2015.[143] '''
x = re.sub(r'\.\[([0-9]{2,5})\]\s', r'.[\1] \n', s)
print(x)

Prints:

It reported the proportion of the edits made from America was 51% for the Wikipedia, and 25% for the simple Wikipedia.[142] 
The Wikimedia Foundation hopes to increase the number in the Global South to 37% by 2015.[143]

edited Jul 05 '19 at 08:12

answered Jul 05 '19 at 06:37

Andrej Kesely

168,389
15
48
91

What is r'.[\1] \n' exactly doing? Please Explain. – Jul 05 '19 at 07:20
@AlmightyHeathcliff `\1` is reference to first capturing group, in this case `([0-9]{2,5})` – Andrej Kesely Jul 05 '19 at 07:33
Thank you; I would like to know why the code is adding a new line after the word **Wikimedia**. I would like to have this: > The Wikimedia Foundation hopes to increase the number in the Global South to 37% by 2015.[143]” – Noor Jul 05 '19 at 08:04
@Noor becauise of formatting of input data. I updated my answer. – Andrej Kesely Jul 05 '19 at 08:13
@AndrejKesely , I used the regular expression which \1 is referencing, It then prints the regular expression. Aren't these two things supposed to be same? – Jul 05 '19 at 08:47
@AlmightyHeathcliff No, its not same when you use regular expression in replacement string. You need to use the reference or you could use function in `re.sub` as well (depends on your case) – Andrej Kesely Jul 05 '19 at 09:11

score 1 · Answer 2 · answered Jul 05 '19 at 06:56

1

You may use

(\.\[[^][]*\])\s*

And replace this with \1\n, see a demo on regex101.com.

This reads

(
    \.\[   # ".[" literally
    [^][]* # neither "[" nor "]" 0+ times
    \]     # "]" literally
)\s*       # consume whitespaces, eventually

answered Jul 05 '19 at 06:56

Jan

42,290
8
54
79

The problem is that it also adds a new line after "[142]" even if there is no white space after [142] For e.g: it adds new line after [142] if the string is "Wikipedia.[142][152]" – Noor Jul 05 '19 at 11:47

score 1 · Answer 3 · answered Jul 05 '19 at 07:19

1

Use findall() to identify list of matching patterns. Then you can replace it with original string+'\n'

answered Jul 05 '19 at 07:19

Junior_K27

151
1
9

How to add newline after every characters like ” .[xxx] “ in a string in python

3 Answers3