4

I have a function that reads a large .txt file, line by line.

As parameter I give to the function the line index from where it should start reading in file.

First I call the function with 0 so that it will begin from the start. At the end I call again the function with a new parameter, but when it reenters in the function the fresh sent index (which is different now) is still 0 in the for statement. :(

from __future__ import print_function
import os
import sys

file = open("file.txt").read().splitlines()

for i, line in enumerate(file):
    if file[i] == "@@@TC_FIN@@@":
        fin = i;
        #print (fin)

def AssembleTC(index):

   while index < fin:

       for index, line in enumerate(file):
           if "@@@ ID:" in line:
               print(file[index+1])
               break

       for index, line in enumerate(file):
           if file[index] == "@@@TC_FIN@@@":
               recursive = index;
               #print (recursive)
               break

       AssembleTC(recursive+1)

AssembleTC(0)

It is vital for me to keep the present for statement with file[index] access procedure. I've read that I could skip lines with something like file.next() but it doesn't work.

Is there any way to skip the number of lines that I want or simply to start the new reading from the updated index? Python 2.7.13 - Thank you!

Marko
  • 407
  • 1
  • 7
  • 19
  • What are you trying to do to this poor file? Can you explain the desired result? – TemporalWolf Mar 22 '17 at 22:56
  • Are the lines a fixed size (in bytes/characters)? If so, you can use [`file.seek()`](https://docs.python.org/2.7/library/stdtypes.html#file.seek) to move the file object's current position. If not, you're plumb out of luck. You have to call `file.next()` to advance the iterator, or call `file.readline()` and discard the output until you get to the line you want. Note that the last two options consume the iterator, meaning you have to capture the skipped lines if you plan to need them later. – Matthew Cole Mar 22 '17 at 23:00
  • `for index, line in enumerate(file):` starts enumeration from the beginning of the list in `file` every time you do it. – tdelaney Mar 22 '17 at 23:01
  • It seems like you want to locate `"@@@ ID:"` as long as it appears before `"@@@TC_FIN@@@"`. Is that true? Do you want to do that multiple times? – tdelaney Mar 22 '17 at 23:03
  • Once passed the lines I don't need them anymore. So when I call the function again it's ok to not being able to access the previous lines. The code is just a sample. I have a big file with similar structures in it and I have to manipulate the structures content, one structure on one function call. – Marko Mar 22 '17 at 23:03
  • Also, don't forget to [`file.close()`](https://docs.python.org/2.7/library/stdtypes.html#file.close) the file object when you're done or else you risk corrupting the file. – Matthew Cole Mar 22 '17 at 23:03
  • 1
    @MatthewCole I don't think he'll corrupt a read only file. – tdelaney Mar 22 '17 at 23:04
  • tdelaney Like I said before, I have multiple identifications to make, then copy some of the content from between two tags and so on. But all the time the @@@TC_FIN@@@ will be the end of the analysed structure and the right index for the next function call. – Marko Mar 22 '17 at 23:05
  • @tdelaney, perhaps not. But there are a lot of [other good reasons](http://stackoverflow.com/questions/7395542/is-explicitly-closing-files-important) not to leave an open file object lying around too, so I'd say the advice stands. – Matthew Cole Mar 22 '17 at 23:06
  • @MatthewCole file.seek() seems ok, because, like I explained before, I wanna always locate the index of the same thing (@@@TC_FIN@@@). But after I locate it, could I give the found location to the FOR so that he will start from there? – Marko Mar 22 '17 at 23:10
  • @Marko: if you check the link to the Python2 documentation that I gave, you'll see that the first argument is `offset`... but it doesn't make it explicitly clear that the offset is measured in bytes, not lines, the same as `file.read(size)` expects a size in bytes. If your lines aren't equally sized in bytes, you won't be able to calculate how many bytes to advance to skip to the correct line. If they are equally sized in bytes, file.seek(num_lines * line_size) moves to num_lines from the start of the file. – Matthew Cole Mar 22 '17 at 23:15
  • I have a new idea. What if after a function call I would delete the file content to the line that I need? After that it would be perfect if the analysis would start from index=0, because the file will be a new one without the previous processed data. Is there any way to delete and update the file without being necessary to indicate to file open that the .txt was changed? – Marko Mar 22 '17 at 23:36
  • Thank you guys for all your comments! It was a useful conversation! – Marko Mar 23 '17 at 14:03

2 Answers2

2

Its a large text file so I think it would be worth revisiting the idea of reading it line by line. File objects keep track of where they are in the file and so they can be restarted inside for loops for additional processing. Generators use yield to pass results back to callers and are a good way to encapsulate functionality.

This example scans a file until it sees the ID, gathers lines until it sees the FIN then hands the data back to the caller. Its a generator so it can be called from a for loop to get all of the records in turn.

from __future__ import print_function
import os
import sys

def my_datablock_iter(fileobj):
    for line in file:
        # find ID
        if "@@@ ID:" in line:
            # build a list of lines until FIN is seen
            wanted = [line.strip()]
            for line in file:
                line = line.strip()
                if line == "@@@TC_FIN@@@":
                    break
                wanted.append(line)
            # hand block back to user
            yield wanted

with open("file.txt") as fp:
    for datablock in my_datablock_iter(fp):
        print(datablock)
tdelaney
  • 73,364
  • 6
  • 83
  • 116
0

I have implemented my idea by erasing the lines which I've already parsed and it works very well, but this is only my happy case, because I do not need anymore any data which I have manipulated. For those who will still need it, I think @tdelaney code is good to use, answer for which I thank him!

Here is how I did it:

from __future__ import print_function
import os
import sys

initialCall = os.stat("test.txt").st_size

def AssembleTC(parameter):

  print("CALLED PARAMETER = " + str(parameter))
  if parameter == 0:
      sys.exit()
  else:
      file = open("test.txt").read().splitlines()
      for index, line in enumerate(file):
          if file[index] == "@@@TC_FIN@@@":
              fin = index;
              print ("FIN POSITION = " + str(fin))
              break

      check = os.stat("test.txt").st_size
      print("File size = " + str(check))

      while check > 1:
          for index, line in enumerate(file):
              if "@@@ TC NR" in line:
                  print(file[index+1])
                  break
          ok=0
          with open("test.txt","r") as textobj:
              mylist = list(textobj)
              del mylist[0:fin+1]
              ok=1

          if ok==1:    
              with open("test.txt", "w") as textobj:
                  for n in mylist:
                      textobj.write(n)

          print("OLD SIZE = " + str(check))
          check = os.stat("test.txt").st_size
          print("NEW SIZE = " + str(check) + "\n")

          AssembleTC(check)

AssembleTC(initialCall)
Marko
  • 407
  • 1
  • 7
  • 19