0

I have a script which i am using to parse a text file.
The script has a While loop in it as their maybe multiple next lines. My current script is having an issue where it is skipping lines. I am pretty sure its something to do with my use of "next()" and its placement, but i cant figure it out.
This is an example of the text file:

object-group network TestNetwork1
 description TestDescription
 network-object host TestHost
 network-object host TestHost
 network-object host TestHost
 network-object host TestHost
object-group network TestNetwork2
 description TestDescription
 network-object host TestHost
object-group network TestNetwork3
 description TestDescription
 network-object host TestHost
object-group network TestNetwork4
 description TestDescription
 network-object host TestHost
object-group network TestNetwork5
 description TestDescription
 network-object host TestHost
object-group network TestNetwork6
 description TestDescription
 network-object host TestHost
object-group network TestNetwork7
 description TestDescription
 network-object host TestHost
object-group network TestNetwork8
 description TestDescription
 network-object host TestHost
object-group network TestNetwork9
 description TestDescription
 network-object host TestHost
object-group network TestNetwork10s
 description TestDescription
 network-object host TestHost

Here is the script:

    import csv
Count = 0
objects = open("test-object-groups.txt", 'r+')
iobjects = iter(objects)

with open('object-group-test.csv', 'wb+') as filename2:
    writer2 = csv.writer(filename2)
    for lines in iobjects:
        if lines.startswith("object-group network"):
            print lines
            Count += 1
            linesplit = lines.split()
            writer2.writerow([linesplit[2]])
            while True:
                nextline = str(next(iobjects))
                if nextline.startswith(" network-object") or nextline.startswith(" description"):
                    nextlinesplit = nextline.split()
                    if nextlinesplit[1] <> "host" and nextlinesplit[1] <> "object" and nextlinesplit[0] <> "description":
                        writer2.writerow(['','subnet', nextlinesplit[1], nextlinesplit[2]])
                    elif nextlinesplit[1] == "host":
                        writer2.writerow(['',nextlinesplit[1], nextlinesplit[2]])
                    elif nextlinesplit[1] == "object":
                        writer2.writerow(['',nextlinesplit[1], nextlinesplit[2]])
                    elif nextlinesplit[0] == "description":
                        writer2.writerow(['',nextlinesplit[0]])

                elif nextline.startswith("object-group"):
                    break

print Count

Here is the output showing that it is skipping lines:

object-group network TestNetwork1

object-group network TestNetwork3

object-group network TestNetwork5

object-group network TestNetwork7

object-group network TestNetwork9

5

As you can see above, the line items are skipping.
Any idea how to fix this?

Dr.Pepper
  • 559
  • 4
  • 11
  • 27

1 Answers1

2
for lines in iobjects:
    ...
    ...
    while True:
        nextline = str(next(iobjects))

Of course it will skip a line. You are calling next(iobjects) while iterating over iobjects, therefore the next line is consumed and not handled by the for loop.

Consider this file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

And this code:

with open('test.txt') as f:
    for line in f:
        print(line)
        if int(line.strip()) % 2 == 0:
            next(f)

The output is:

1

2

4

6

8

10

12

14

Every second line is missing since we call next if the number is even.

Suggested solutions:

  1. Use itertools.tee to create 2 different generators. Probably the least straightforward solution.

  2. Use f.readlines() and operate on a list of lines from the file instead of the iterator. This way you can work with indices.

  3. Use the more-itertools package which creates a "peekable" iterator: https://stackoverflow.com/a/27698681/1453822

  4. Don't parse the file line by line. Use regex to extract information from the file block by block. For example, the regex r'(object-group.*?)(?=$|object-group)' will do. (I'm sure this is far from the optimal regex). Make sure you are using the re.DOTALL flag.

    import re
    
    with open('test.txt') as f:
        file_content = f.read()
    
    for group in re.findall(r'(object-group.*?)(?=$|object-group)', file_content, re.DOTALL):
        print(group)
    
    # object-group network TestNetwork1
    #  description TestDescription
    #  network-object host TestHost
    #  network-object host TestHost
    #  network-object host TestHost
    #  network-object host TestHost
    # 
    # object-group network TestNetwork2
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork3
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork4
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork5
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork6
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork7
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork8
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork9
    #  description TestDescription
    #  network-object host TestHost
    # 
    # object-group network TestNetwork10s
    #  description TestDescription
    #  network-object host TestHost
    


As a side note, iobjects = iter(objects) is redundant. open already returns an iterator.

DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • I thought that may be the case. Do you have any recommendations on methods to achieve the right outcome? – Dr.Pepper Dec 03 '17 at 14:10
  • wrote it 1 minute faster than me :) – ddor254 Dec 03 '17 at 14:11
  • @Dr.Pepper See updated answer with several solutions – DeepSpace Dec 03 '17 at 14:24
  • Thanks. The issue i have is that the file in question can be thousands of lines long ( Cisco Firewall Configuration). I am trying to extract the Object-Groups into a CSV file. Whenever i reach a line that has Object-Group, i need to read the next "X" number of lines, which may be 1 or 50 lines. I need to have a dig around to see how to do this with ReadLines as i looked into this previously and couldnt find an answer – Dr.Pepper Dec 03 '17 at 14:29
  • @Dr.Pepper Don't parse the file line by line. Use a regex `r'(object-group.*?)(?=$|object-group)'` (I'm sure this is far from the optimal regex). Make sure you are using the `re.DOTALL` flag. – DeepSpace Dec 03 '17 at 14:53
  • Thanks DeepSpace. Matching the RegEx will only match on that one line if i am not mistaken? How can i then read the follow X amount of lines to see if they start with "description" or "network-object" so i can put the description or network-object under the correct Object-Group in a csv? I would still need to read the next line like i am currently doing wouldnt i? – Dr.Pepper Dec 03 '17 at 14:59
  • @Dr.Pepper no, and no. See my answer. – DeepSpace Dec 03 '17 at 15:00