1

I'm happy to ask my first python question !!! I would like to strip the beginning (the part before the first occurrence of the article) of the sample file below. To do this I use re.sub library.

below this is my file sample.txt:

fdasfdadfa
adfadfasdf
afdafdsfas
adfadfadf
adfadsf
afdaf

article: name of the first article
aaaaaaa
aaaaaaa
aaaaaaa
article: name of the first article
bbbbbbb
bbbbbbb
bbbbbbb
article: name of the first article
ccccccc
ccccccc
ccccccc

And my Python code to parse this file:

for line in open('sample.txt'):
    test = test + line

result = re.sub(r'.*article:', 'article', test, 1, flags=re.S)
print result

Sadly this code only displays the last article. The output of the code:

article: name of the first article
ccccccc
ccccccc
ccccccc

Does someone know how to strip only the beginning of the file and display the 3 articles?

Arun
  • 1,933
  • 2
  • 28
  • 46
skadomers
  • 13
  • 2
  • I don't really understand what you are trying to do from your code. Are you trying to replace all instances of "article: " with "article"? – ggbranch Mar 28 '18 at 02:58
  • using [non-greedy regex](https://stackoverflow.com/questions/7124778/how-to-match-anything-up-until-this-sequence-of-characters-in-a-regular-expres) would help you here (`.*` --> `.*?`).. also, replacement section is missing `:` .. however slurping whole file is not advisable if file is big.. also, you could use `open('sample.txt').read()` instead of custom for loop – Sundeep Mar 28 '18 at 07:24
  • 1
    Oh great, I tried with the non-greedy regex and it works !!!!!! Thanks a lot – skadomers Mar 28 '18 at 14:22

1 Answers1

3

You can use itertools.dropwhile to get this effect

from itertools import dropwhile

with open('filename.txt') as f:
    articles = ''.join(dropwhile(lambda line: not line.startswith('article'), f))

print(articles)

prints

article: name of the first article
aaaaaaa
aaaaaaa
aaaaaaa
article: name of the first article
bbbbbbb
bbbbbbb
bbbbbbb
article: name of the first article
ccccccc
ccccccc
ccccccc
Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96
  • Thanks for your help it's working with itertools.dropwhile. I didn't know this library. Sundeep gave me an other solution also: it's to use a non-greedy expression. Thanks for your help – skadomers Mar 28 '18 at 14:29