0

I have the following text file:

This is my text file
NUM,123
FRUIT
DRINK
FOOD,BACON
CAR
NUM,456
FRUIT
DRINK
FOOD,BURGER
CAR
NUM,789
FRUIT
DRINK
FOOD,SAUSAGE
CAR
NUM,012
FRUIT
DRINK
FOOD,MEATBALL
CAR

And I have the following list called 'wanted':

['123', '789']

What I'm trying to do is if the numbers after NUM is not in the list called 'wanted', then that line along with 4 lines below it gets deleted. So the output file will looks like:

This is my text file
NUM,123
FRUIT
DRINK
FOOD,BACON
CAR
NUM,789
FRUIT
DRINK
FOOD,SAUSAGE
CAR

My code so far is:

infile = open("inputfile.txt",'r')
data = infile.readlines()

for beginning_line, ube_line in enumerate(data):
    UNIT = data[beginning_line].split(',')[1]
    if UNIT not in wanted:
        del data_list[beginning_line:beginning_line+4]
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user1546610
  • 175
  • 5
  • 13

6 Answers6

4

You shouldn't modify a list while you are looping over it.

What you could try is to just advance the iterator on the file object when needed:

wanted = set(['123', '789'])

with open("inputfile.txt",'r') as infile, open("outfile.txt",'w') as outfile: 
    for line in infile:
        if line.startswith('NUM,'):
            UNIT = line.strip().split(',')[1] 
            if UNIT not in wanted:
                for _ in xrange(4):
                    infile.next()
                continue

        outfile.write(line)

And use a set. It is faster for constantly checking the membership.

This approach doesn't make you read in the entire file at once to process it in a list form. It goes line by line, reading from the file, advancing, and writing to the new file. If you want, you can replace the outfile with a list that you are appending to.

jdi
  • 90,542
  • 19
  • 167
  • 203
  • Is it possible to write it back to the input file? – user1546610 Aug 24 '12 at 23:39
  • Not until after you read everything out of it first. What you would want to do instead is make the outfile a tempfile. Then when the whole thing is successful, you move the outfile over the infile. – jdi Aug 25 '12 at 02:27
0

edit: deleting items while iterating is probably not a good idea, see: Remove items from a list while iterating

infile = open("inputfile.txt",'r')
data = infile.readlines()
SKIP_LINES = 4
skip_until = False

result_data = []
for current_line, line in enumerate(data):
    if skip_until and skip_until < current_line:
        continue

    try:
        _, num = line.split(',')
    except ValueError:
        pass
    else:
       if num not in wanted:
           skip_until = current_line + SKIP_LINES
       else:
           result_data.append(line)

... and result_data is what you want.

Community
  • 1
  • 1
yedpodtrzitko
  • 9,035
  • 2
  • 40
  • 42
0

There are some issues with the code; for instance, data_list isn't even defined. If it's a list, you can't del elements from it; you can only pop. Then you use both enumerate and direct index access on data; also readlines is not needed.

I'd suggest to avoid keeping all lines in memory, it's not really needed here. Maybe try with something like (untested):

with open('infile.txt') as fin, open('outfile.txt', 'w') as fout:
   for line in fin:
       if line.startswith('NUM,') and line.split(',')[1] not in wanted:
           for _ in range(4):
               fin.next()
       else:
           fout.write(line)
Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
0
import re
# find the lines that match NUM,XYZ
nums = re.compile('NUM,(?:' + '|'.join(['456','012']) + ")")
# find the three lines after a nums match
line_matches = breaks = re.compile('.*\n.*\n.*\n')
keeper = ''
for line in nums.finditer(data):
    keeper += breaks.findall( data[line.start():] )[0]

result on the given string is

NUM,456
FRUIT
DRINK
FOOD,BURGER

NUM,012
FRUIT
DRINK
FOOD,MEATBALL
Matti Lyra
  • 12,828
  • 8
  • 49
  • 67
0

If you don't mind building a list, and iff your "NUM" lines come every 5 other line, you may want to try:

keep = []
for (i, v) in enumerate(lines[::5]):
    (num, current) = v.split(",")
    if current in wanted:
        keep.extend(lines[i*5:i*5+5])
Pierre GM
  • 19,809
  • 3
  • 56
  • 67
0

Don't try to think of this in terms of building up a list and removing stuff from it while you loop over it. That way leads madness.

It is much easier to write the output file directly. Loop over lines of the input file, each time deciding whether to write it to the output or not.

Also, to avoid difficulties with the fact that not every line has a comma, try just using .partition instead to split up the lines. That will always return 3 items: when there is a comma, you get (before the first comma, the comma, after the comma); otherwise, you get (the whole thing, empty string, empty string). So you can just use the last item from there, since wanted won't contain empty strings anyway.

skip_counter = 0
for line in infile:
    if line.partition(',')[2] not in wanted:
        skip_counter = 5
    if skip_counter:
        skip_counter -= 1
    else:
        outfile.write(line)
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153