0

I am a newbie and would like to extract dates from a txt file and write them to another file. Each date in one line. But I don't get how. I tried append but it won't work and this way it only writes the last date:

f = open("Krupp.txt", "r")
contents = f.read()

f.close() #close the file

# finditer
# finds all Dates and shows them in a List (Montag, 15. März 2013)
for m in re.finditer("(Montag|Dienstag|Mittwoch|Donnerstag|Freitag|Samstag|Sonnabend|Sonntag)(, )([123][0-9]|[1-9])(. )(Januar|Februar|März|April|Mai|Juni|Juli|August|September|Oktober|November|Dezember)( )([0-2][0-9][0-9][0-9])", contents):
    print m.group(0)
    # changed
    with open("testoutput.txt", "a") as myfile:
    myfile.write(m.group(0))

---EDIT--- I changed

f.write(contents) # writes contents correctly to file with Umlauts
    f.write(m.group(0))

to

with open("testoutput.txt", "a") as myfile:
    myfile.write(m.group(0))

Now it writes all Dates to the file, but it writes them directly after another. What do I have to add, if I want them below eachother?

Can anybody help?

best regards

Elite
  • 15
  • 7
  • Please provide more details around this. Provide a sample of what the file looks like. Are you getting anything from your regex match? Furthermore, you are also constantly over-writing the file inside your loop every time you open it in 'write' mode each time. You want to open your file outside of your loop and then write. – idjaw Mar 19 '17 at 15:59
  • You are simply overwriting your file in every iteration, maybe `open("testoutput.txt", "a")` is what you're looking for. Furthermore, opening and writing to the file on each iteration is very slow - save in to a string and write it once afterwards. – Jan Mar 19 '17 at 15:59

2 Answers2

1

What do I have to add, if I want them below eachother?

I guess, you mean a linefeed:

myfile.write("\n")

0

The following is working for me on python 2.7.6

#!/bin/python
# -*- coding: utf-8 -*-

import re

f = open("Krupp.txt", "r")
contents = f.read()

f.close() #close the file

# finditer
# finds all Dates and shows them in a List (Montag, 15. März 2013)
with open("testoutput.txt", "a+") as f:
    for m in re.finditer("(Montag|Dienstag|Mittwoch|Donnerstag|Freitag|Samstag|Sonnabend|Sonntag)(, )([123][0-9]|[1-9])(. )(Januar|Februar|März|April|Mai|Juni|Juli|August|September|Oktober|November|Dezember)( )([0-2][0-9][0-9][0-9])", contents):
        print m.group(0)
        f.write(m.group(0))
        f.write("\n")

The data file I used to test is:

Montag, 10. März 2013
Montag, 15. Juni 2013
Freitag, 15. März 2013
Montag, 15. Januar 2013
Dienstag, 15. März 2013
Montag, 15. März 2013
Juli, 15. Februar - incomplete
Juli, 15. Februar 2013
asdasdasdasdasd;lasdjkfas;dlfjk;a fjasl;dfj ;akdfj;askjdfa
Mittwoch, 15. März 2013
test
Mittwoch, 15. Januar 2013
blah
Montag, 15. März 2013

Code explanation/changes:

  1. I had to add # -*- coding: utf-8 -*- for python to get UTF characters in the source
  2. open("testoutput.txt", "a+") this opens the file in read+append mode.
  3. You were re-opening the file in every loop which is not suggested! Moved the open before the loop
  4. with open expression automatically closes the file when out of the context (when with block finishes). It is generally more safe since it will also close the file on exceptions and errors
  5. f.write("\n"): Answers your edit ... adds a new line after each entry

Let me know if you have more questions or you need more explanation

Community
  • 1
  • 1
urban
  • 5,392
  • 3
  • 19
  • 45