Python Loop Utilising next() Skipping Lines

Question

I have a script which i am using to parse a text file.
The script has a While loop in it as their maybe multiple next lines. My current script is having an issue where it is skipping lines. I am pretty sure its something to do with my use of "next()" and its placement, but i cant figure it out.
This is an example of the text file:

object-group network TestNetwork1
 description TestDescription
 network-object host TestHost
 network-object host TestHost
 network-object host TestHost
 network-object host TestHost
object-group network TestNetwork2
 description TestDescription
 network-object host TestHost
object-group network TestNetwork3
 description TestDescription
 network-object host TestHost
object-group network TestNetwork4
 description TestDescription
 network-object host TestHost
object-group network TestNetwork5
 description TestDescription
 network-object host TestHost
object-group network TestNetwork6
 description TestDescription
 network-object host TestHost
object-group network TestNetwork7
 description TestDescription
 network-object host TestHost
object-group network TestNetwork8
 description TestDescription
 network-object host TestHost
object-group network TestNetwork9
 description TestDescription
 network-object host TestHost
object-group network TestNetwork10s
 description TestDescription
 network-object host TestHost

Here is the script:

    import csv
Count = 0
objects = open("test-object-groups.txt", 'r+')
iobjects = iter(objects)

with open('object-group-test.csv', 'wb+') as filename2:
    writer2 = csv.writer(filename2)
    for lines in iobjects:
        if lines.startswith("object-group network"):
            print lines
            Count += 1
            linesplit = lines.split()
            writer2.writerow([linesplit[2]])
            while True:
                nextline = str(next(iobjects))
                if nextline.startswith(" network-object") or nextline.startswith(" description"):
                    nextlinesplit = nextline.split()
                    if nextlinesplit[1] <> "host" and nextlinesplit[1] <> "object" and nextlinesplit[0] <> "description":
                        writer2.writerow(['','subnet', nextlinesplit[1], nextlinesplit[2]])
                    elif nextlinesplit[1] == "host":
                        writer2.writerow(['',nextlinesplit[1], nextlinesplit[2]])
                    elif nextlinesplit[1] == "object":
                        writer2.writerow(['',nextlinesplit[1], nextlinesplit[2]])
                    elif nextlinesplit[0] == "description":
                        writer2.writerow(['',nextlinesplit[0]])

                elif nextline.startswith("object-group"):
                    break

print Count

Here is the output showing that it is skipping lines:

object-group network TestNetwork1

object-group network TestNetwork3

object-group network TestNetwork5

object-group network TestNetwork7

object-group network TestNetwork9

5

As you can see above, the line items are skipping.
Any idea how to fix this?

Have you tried commenting out code until you find the line that cause the problem? — klutt, Dec 03 '17 at 14:11

DeepSpace · Answer 1 · 2017-12-03T15:01:03.953

for lines in iobjects:
    ...
    ...
    while True:
        nextline = str(next(iobjects))

Of course it will skip a line. You are calling next(iobjects) while iterating over iobjects, therefore the next line is consumed and not handled by the for loop.

Consider this file:

And this code:

with open('test.txt') as f:
    for line in f:
        print(line)
        if int(line.strip()) % 2 == 0:
            next(f)

The output is:

Every second line is missing since we call next if the number is even.

Suggested solutions:

Use itertools.tee to create 2 different generators. Probably the least straightforward solution.
Use f.readlines() and operate on a list of lines from the file instead of the iterator. This way you can work with indices.
Use the more-itertools package which creates a "peekable" iterator: https://stackoverflow.com/a/27698681/1453822

Don't parse the file line by line. Use regex to extract information from the file block by block. For example, the regex r'(object-group.*?)(?=$|object-group)' will do. (I'm sure this is far from the optimal regex). Make sure you are using the re.DOTALL flag.

import re

with open('test.txt') as f:
    file_content = f.read()

for group in re.findall(r'(object-group.*?)(?=$|object-group)', file_content, re.DOTALL):
    print(group)

# object-group network TestNetwork1
#  description TestDescription
#  network-object host TestHost
#  network-object host TestHost
#  network-object host TestHost
#  network-object host TestHost
# 
# object-group network TestNetwork2
#  description TestDescription
#  network-object host TestHost
# 
# object-group network TestNetwork3
#  description TestDescription
#  network-object host TestHost
# 
# object-group network TestNetwork4
#  description TestDescription
#  network-object host TestHost
# 
# object-group network TestNetwork5
#  description TestDescription
#  network-object host TestHost
# 
# object-group network TestNetwork6
#  description TestDescription
#  network-object host TestHost
# 
# object-group network TestNetwork7
#  description TestDescription
#  network-object host TestHost
# 
# object-group network TestNetwork8
#  description TestDescription
#  network-object host TestHost
# 
# object-group network TestNetwork9
#  description TestDescription
#  network-object host TestHost
# 
# object-group network TestNetwork10s
#  description TestDescription
#  network-object host TestHost

As a side note, iobjects = iter(objects) is redundant. open already returns an iterator.

I thought that may be the case. Do you have any recommendations on methods to achieve the right outcome? — Dr.Pepper, Dec 03 '17 at 14:10
Thanks. The issue i have is that the file in question can be thousands of lines long ( Cisco Firewall Configuration). I am trying to extract the Object-Groups into a CSV file. Whenever i reach a line that has Object-Group, i need to read the next "X" number of lines, which may be 1 or 50 lines. I need to have a dig around to see how to do this with ReadLines as i looked into this previously and couldnt find an answer — Dr.Pepper, Dec 03 '17 at 14:29
@Dr.Pepper Don't parse the file line by line. Use a regex `r'(object-group.*?)(?=$|object-group)'` (I'm sure this is far from the optimal regex). Make sure you are using the `re.DOTALL` flag. — DeepSpace, Dec 03 '17 at 14:53
Thanks DeepSpace. Matching the RegEx will only match on that one line if i am not mistaken? How can i then read the follow X amount of lines to see if they start with "description" or "network-object" so i can put the description or network-object under the correct Object-Group in a csv? I would still need to read the next line like i am currently doing wouldnt i? — Dr.Pepper, Dec 03 '17 at 14:59

Python Loop Utilising next() Skipping Lines

1 Answers1