-1

I have regular expression \n([\d]) that can match this following text: enter image description here

Then I want to replace that matched text with first group or $1 in Visual Studio Code. This is the result:

enter image description here

I want the same idea in python, which I already make this code.

import re    

file = "out FCE.txt"    
pattern = re.compile(".+")

for i, line in enumerate(open(file)):
    for match in re.finditer(pattern, line):
        print(re.sub(r"\n([\d])", r"\1", match.group()))

But that code does nothing to it. Which mean the result is still the same as the first picture. Newlines and the line with numbers at first character are not removed. I already read this answer, that python is using \1 not $1. And yes, I want to keep the whitespaces between in order to be neat as \t\t\t.

Sorry if my explanation is confusing and also my english is bad.

Nggarap
  • 63
  • 8

1 Answers1

3

The problem here is that you are reading the file line by line. In each loop of for i, line in enumerate(open(file)):, re.sub accesses only one line, and therefore it cannot see whether the next line starts with a digit.

Try instead:

import re

file = "out FCE.txt"

with open(file, 'r') as f:
    text = f.read()

new_text = re.sub(r"\n([\d])", r"\1", text)
print(new_text)

In this code the file is read as a whole (into the variable text) so that re.sub now sees whether the subsequent line starts with a digit.

j1-lee
  • 13,764
  • 3
  • 14
  • 26
  • Wow thank you man. But what if I still want to enumerate the file? Is there any approach? – Nggarap Mar 26 '21 at 04:03
  • I don't think that is possible with looping over each line (or there might be a way, but I guess it would be quite ugly.) – j1-lee Mar 26 '21 at 04:07