how to strip the beginning of a file with python library re.sub?

Question

I'm happy to ask my first python question !!! I would like to strip the beginning (the part before the first occurrence of the article) of the sample file below. To do this I use re.sub library.

below this is my file sample.txt:

fdasfdadfa
adfadfasdf
afdafdsfas
adfadfadf
adfadsf
afdaf

article: name of the first article
aaaaaaa
aaaaaaa
aaaaaaa
article: name of the first article
bbbbbbb
bbbbbbb
bbbbbbb
article: name of the first article
ccccccc
ccccccc
ccccccc

And my Python code to parse this file:

for line in open('sample.txt'):
    test = test + line

result = re.sub(r'.*article:', 'article', test, 1, flags=re.S)
print result

Sadly this code only displays the last article. The output of the code:

article: name of the first article
ccccccc
ccccccc
ccccccc

Does someone know how to strip only the beginning of the file and display the 3 articles?

I don't really understand what you are trying to do from your code. Are you trying to replace all instances of "article: " with "article"? — ggbranch, Mar 28 '18 at 02:58
using [non-greedy regex](https://stackoverflow.com/questions/7124778/how-to-match-anything-up-until-this-sequence-of-characters-in-a-regular-expres) would help you here (`.*` --> `.*?`).. also, replacement section is missing `:` .. however slurping whole file is not advisable if file is big.. also, you could use `open('sample.txt').read()` instead of custom for loop — Sundeep, Mar 28 '18 at 07:24
Oh great, I tried with the non-greedy regex and it works !!!!!! Thanks a lot — skadomers, Mar 28 '18 at 14:22

score 3 · Accepted Answer · answered Mar 28 '18 at 03:08

3

You can use itertools.dropwhile to get this effect

from itertools import dropwhile

with open('filename.txt') as f:
    articles = ''.join(dropwhile(lambda line: not line.startswith('article'), f))

print(articles)

prints

article: name of the first article
aaaaaaa
aaaaaaa
aaaaaaa
article: name of the first article
bbbbbbb
bbbbbbb
bbbbbbb
article: name of the first article
ccccccc
ccccccc
ccccccc

answered Mar 28 '18 at 03:08

Patrick Haugh

59,226
13
88
96

Thanks for your help it's working with itertools.dropwhile. I didn't know this library. Sundeep gave me an other solution also: it's to use a non-greedy expression. Thanks for your help – skadomers Mar 28 '18 at 14:29

how to strip the beginning of a file with python library re.sub?

1 Answers1

Linked