Python using re module to parse an imported text file

Question

def regexread():
    import re

    result = ''
    savefileagain = open('sliceeverfile3.txt','w')

    #text=open('emeverslicefile4.txt','r')
    text='09,11,14,34,44,10,11,  27886637,    0\n561, Tue, 5,Feb,2013, 06,25,31,40,45,06,07,  19070109,    0\n560, Fri, 1,Feb,2013, 05,21,34,37,38,01,06,  13063500,    0\n559, Tue,29,Jan,2013,'

    pattern='\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
    #with open('emeverslicefile4.txt') as text:     
    f = re.findall(pattern,text)
    
    for item in f:
        print(item)
        
    savefileagain.write(item)
    #savefileagain.close()

The above function as written parses the text and returns sets of seven numbers. I have three problems.

Firstly the 'read' file which contains exactly the same text as text='09,...etc' returns a TypeError expected string or buffer, which I cannot solve even by reading some of the posts.
Secondly, when I try to write results to the 'write' file, nothing is returned and
thirdly, I am not sure how to get the same output that I get with the print statement, which is three lines of seven numbers each which is the output that I want.

score 13 · Accepted Answer · edited Apr 14 '22 at 14:07

13

This should do the trick:

import re
filename = 'sliceeverfile3.txt'
pattern  = '\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
new_file = []

# Make sure file gets closed after being iterated
with open(filename, 'r') as f:
   # Read the file contents and generate a list with each line
   lines = f.readlines()

# Iterate each line
for line in lines:

    # Regex applied to each line 
    match = re.search(pattern, line)
    if match:
        # Make sure to add \n to display correctly when we write it back
        new_line = match.group() + '\n'
        print new_line
        new_file.append(new_line)

with open(filename, 'w') as f:
     # go to start of file
     f.seek(0)
     # actually write the lines
     f.writelines(new_file)

edited Apr 14 '22 at 14:07

vvvvv

25,404
19
49
81

answered Feb 12 '13 at 22:20

OmegaOuter

375
1
11

Thank you. This returns only one line of numbers 09,11,14,34,44,10,11. Perhaps I have done the indents incorrectly? The file that I am reading from is as follows N1,N2,N3,N4,N5,L1,L2, Jackpot, Wins\n562, Fri, 8,Feb,2013, 09,11,14,34,44,10,11, 27886637, 0\n561, Tue, 5,Feb,2013, 06,25,31,40,45,06,07, 19070109, 0\n560, Fri, 1,Feb,2013, 05,21,34,37,38,01,06, 13063500, 0\n559, Tue,29,Jan,2013, 09,16,26,36,39,02,06, 6431250, 2\n558, Fri,25,Jan,2013, 03,10,18,31,37,02,04, 37772357, 1\n557, Tue,22,Jan,2013, Thank you for your help. – user1478335 Feb 13 '13 at 09:54
for line in lines: # Regex applied to each line match = re.findall(pattern, line) if match: # Make sure to add \n to display correctly when we write it back #new_line = match.group() + '\n' print (match) new_file.append(match) lines = f.readlines() I changed it to the script here which seems to work. I think that the file is simply one continuous 'sentence' and not separate lines as it appears in a text editor?? – user1478335 Feb 13 '13 at 10:10
Fixed the issue, I didnt really test the code. I put f.write instead of f.writelines which is the correct method for writing a list of strings in a file. It will write only the matching numbers to the file. If you require different output then modify new_line contents so its reflected in the final name. Also i would recommend using another filename for output file, its better to keep originals ;) – OmegaOuter Feb 13 '13 at 23:59

score 0 · Answer 2 · edited May 23 '17 at 12:16

0

You're sort of on the right track...

You'll iterate over the file: How to iterate over the file in python

and apply the regex to each line. The link above should really answer all 3 of your questions when you realize you're trying to write 'item', which doesn't exist outside of that loop.

edited May 23 '17 at 12:16

Community

1
1

answered Feb 12 '13 at 19:45

brwnj

105
2
8

Python using re module to parse an imported text file

2 Answers2

Linked