Regex Pyhon: cannot replace newlines with "$1"

Question

I have regular expression \n([\d]) that can match this following text:

Then I want to replace that matched text with first group or $1 in Visual Studio Code. This is the result:

I want the same idea in python, which I already make this code.

import re    

file = "out FCE.txt"    
pattern = re.compile(".+")

for i, line in enumerate(open(file)):
    for match in re.finditer(pattern, line):
        print(re.sub(r"\n([\d])", r"\1", match.group()))

But that code does nothing to it. Which mean the result is still the same as the first picture. Newlines and the line with numbers at first character are not removed. I already read this answer, that python is using \1 not $1. And yes, I want to keep the whitespaces between in order to be neat as \t\t\t.

Sorry if my explanation is confusing and also my english is bad.

Use a raw string in your call to `re.sub`: `re.sub(r'\n([\d])', r'\1', match.group())` — Tim Biegeleisen, Mar 26 '21 at 03:51
I tried, but doesn't work at all. The result is stil the same. — Nggarap, Mar 26 '21 at 03:53

j1-lee · Accepted Answer · 2021-03-26T04:04:45.743

3

The problem here is that you are reading the file line by line. In each loop of for i, line in enumerate(open(file)):, re.sub accesses only one line, and therefore it cannot see whether the next line starts with a digit.

Try instead:

import re

file = "out FCE.txt"

with open(file, 'r') as f:
    text = f.read()

new_text = re.sub(r"\n([\d])", r"\1", text)
print(new_text)

In this code the file is read as a whole (into the variable text) so that re.sub now sees whether the subsequent line starts with a digit.

edited Mar 26 '21 at 04:04

answered Mar 26 '21 at 04:00

j1-lee

13,764
3
14
26

Wow thank you man. But what if I still want to enumerate the file? Is there any approach? – Nggarap Mar 26 '21 at 04:03
I don't think that is possible with looping over each line (or there might be a way, but I guess it would be quite ugly.) – j1-lee Mar 26 '21 at 04:07

Regex Pyhon: cannot replace newlines with "$1"

1 Answers1