extract strings from files using regex with python

Asked Dec 14 '16 at 11:56

Active Dec 14 '16 at 12:31

Viewed 30 times

I want to extract specific parts of txt-files with python.

Here is my code:

import re 
with open('test1.txt') as test_text:
   data = test_text.read()
   wanted_match = re.findall('start(\n.*?)+?end', data)
   wanted_match_str = ",".join(wanted_match)
with open("output.txt", "w") as output:
output.write(wanted_match_str)

My txt-files look like this (includes newlines):

blablabla start blobloblobloblo bloblo blobloblo end bla blablabla start blobloblobloblo bloblo blobloblo end bla blablabla

and so on. I want to extract only the bloblob parts of the text and write them to a file (and not the blabla parts). According to pythex my regex should work (http://pythex.org), but all I get as my output is a list of commas. Can you help me? Thanks in advance! majee

edited Dec 14 '16 at 12:31

asked Dec 14 '16 at 11:56

majee

1

@PavneetSingh: No, simply `(?s)start(.*?)end` – Wiktor Stribiżew Dec 14 '16 at 12:02
the texts include newlines, so simply start(.*?)+?end doesn't work. – majee Dec 14 '16 at 12:20
the code now gives me the whole text as output and not just the blob-parts. – majee Dec 14 '16 at 12:21
The following modification worked: wanted_match = re.findall('start.*?end', data, re.DOTALL) – majee Dec 14 '16 at 13:44
It is exactly the same, just the delimiters are now part of the output. – Wiktor Stribiżew Dec 14 '16 at 13:46

extract strings from files using regex with python

0 Answers0