0

I am trying to search through a list of files and extract the line start with "id'. This occurs for many times in each file and often in the first line of text in the file.

The code I have written so far works, however it seems to miss the first line in each file (the first occurrence of 'id').

for file2 in data_files2:
    with open(file2, 'r') as f:  # use context manager to open files
        for line in f:
            lines = f.readlines()
            a=0

            while a < len(lines):
                temp_array = lines[a].rstrip().split(",")
                if temp_array[0] == "id":
                    game_id = temp_array[1]

Any suggestions on how I can include this first line of text in the readlines? I tried changing a to -1 so it would include the first line of text (where a=0) but this didn't work.

EDIT:

I need to keep 'a' in my code as an index because I use it later on. The code I showed above was truncated. Here is more of the code for example. Any suggestions on how else I can remove "for line in f:"?

for file2 in data_files2:
    with open(file2, 'r') as f:  # use context manager to open files
        for line in f:
            lines = f.readlines()
            a=0

            while a < len(lines):
                temp_array = lines[a].rstrip().split(",")
                if temp_array[0] == "id":
                    game_id = temp_array[1]


                    for o in range(a+1,a+7,1):
                         if lines[o].rstrip().split(",")[1]== "visteam":
                            awayteam = lines[o].rstrip().split(",")[2]
                         if lines[o].rstrip().split(",")[1]== "hometeam":
                            hometeam = lines[o].rstrip().split(",")[2]
                         if lines[o].rstrip().split(",")[1]== "date":
                            date = lines[o].rstrip().split(",")[2]
                         if lines[o].rstrip().split(",")[1]== "site":
                            site = lines[o].rstrip().split(",")[2]
Lauren
  • 9
  • 2
  • 'for line in f' reads the first line. Then you have 'lines = f.readlines()' reads the remaining lines (missing the first line). [Example of processing through a text file](https://stackoverflow.com/questions/17436709/python-loop-through-a-text-file-reading-data) – DarrylG Oct 29 '19 at 04:27

1 Answers1

0
for file2 in data_files2:
    with open(file2, 'r') as f:  # use context manager to open files
        for line in f:
            temp_array = line.rstrip().split(",")
            if temp_array[0] == "id":
                game_id = temp_array[1]

The above should work, it can also be made a bit faster as there is no need to create a list for each line:

for file2 in data_files2:
    with open(file2, 'r') as f:  # use context manager to open files
        for line in f:
            if line.startswith("id,"):
                temp_array = line.rstrip().split(",")
                game_id = temp_array[1]

You can use enumerate to keep track of the current line number. Here is another way having seen your edit to the question;

for file2 in data_files2:

    with open(file2, 'r') as f:  # use context manager to open files
        lines = f.readlines()
        for n, line in enumerate(lines):

            if line.startswith("id,"):
                game_id = line.rstrip().split(",")[1]

                for o in range(n + 1, n + 7):

                    linedata = lines[o].rstrip().split(",")
                    spec = linedata[1]

                    if spec == "visteam":
                        awayteam = linedata[2]
                    elif spec == "hometeam":
                        hometeam = linedata[2]
                    elif spec == "date":
                        date = linedata[2]
                    elif spec == "site":
                        site = linedata[2]

You should also consider using the csv library for working with csv files.

Holy Mackerel
  • 3,259
  • 1
  • 25
  • 41
  • Hi thank you for your reply. I have added an edit to my question above. I'm not sure how to implement this as I was using 'a' to index my code later on. – Lauren Oct 29 '19 at 04:57
  • @Lauren You can use `enumerate()`. I have updated my answer with an example. – Holy Mackerel Oct 29 '19 at 06:14