0

I have a file with contents like this (I don't wish to change the contents of the file in any way):

.
.
lines I don't need.
.
.
abc      # I know where it starts and the data can be anything, not just abc
efg      # I know where it ends.
.
.
lines I don't need.
.
.

I know the line numbers (index) from where my useful data starts and ends. The useful lines can have any unpredictable data. Now I wish to make a list out of this data, like this:

[['a','b','c'],['e','f','g']]

Please note that there are no spaces in between a, b and so on in the input file so i guess the split() function won't work. What would be the best way to achieve this in python?

PKBEST
  • 321
  • 2
  • 13
  • Possible duplicate of [How to jump to a particular line in a huge text file?](https://stackoverflow.com/questions/620367/how-to-jump-to-a-particular-line-in-a-huge-text-file) – FHTMitchell Jun 06 '18 at 09:25
  • please tell us in what format you know what part you want, i.e. do you know line number, or after 'n' bytes, or are you matching start and end expressions? – akash Jun 06 '18 at 09:49
  • @AkashGupta I have mentioned this in the edited question, I know the line numbers of start and end. I don't know how to split it. – PKBEST Jun 06 '18 at 09:52

4 Answers4

0

Use seek to obtain a specific part of file,

with open(<filename>) as file:
    file.seek(<start_index>)
    data = file.read(<end_index> - <start_index>)

This will give you the part between indexes given.

akash
  • 587
  • 4
  • 16
0

You can just iterate over the file and ignore the files you don't want. Then use the split function to split the words.

for line in file:
    if(IsLineThatYouWant(line)):
        characters = line.split("")
        DoMoreThingsWithChars(characters)
Arnaud VdP
  • 227
  • 1
  • 9
  • 1
    How would user know what line he wants, lets assume the file is "abcefg" how would i check if I want if I've never seen it before. All user knows if from where to where, not what line or not – akash Jun 06 '18 at 09:44
  • split("") gives me a ValueError: empty seperator – PKBEST Jun 06 '18 at 09:46
  • My bad, it seems `list(line)` does what i wanted split("") to do. – Arnaud VdP Jun 06 '18 at 09:52
  • My 'solution' assumes the user knows what the lines he wants look like, and not at what line numbers they appear. E.g. the users wants all lines starting with 'abc' – Arnaud VdP Jun 06 '18 at 09:53
0

You can read all the lines and then narrow it down:

with open('myfile.txt') as f:
    lines = [line.strip() for line in f]

Now to take only the lines you need, assuming they always start with exactly "abc" and end with exactly "efg"

lines = lines[lines.index('abc'):lines.index('efg')+1]

If you need more flexible ways to narrow down lines you need to be more specific in your question. Anyway, this solution is good if you know for sure the file fits in memory. For larger files you will have to be more sophisticated and drop lines "on the fly"

lines_to_keep = []
started = False
with open('myfile.txt') as f:
    for line in f:
        line = line.strip()
        if 'abc' in line:
            started = True
        if started:
            lines_to_keep.append(line)
        if 'efg' in line:
            break

After all that is done, you can split the list the anyway you want:

lines = [list(line) for line in lines)]
Ofer Sadan
  • 11,391
  • 5
  • 38
  • 62
  • 1
    what if "abc" also occurs before the part in which we thought it would – akash Jun 06 '18 at 09:41
  • You're correct, if he knows exactly what byte/char to start on - your solution is the best one. If he knows just the exact text of the line to start on, then mine is. Unfortunately OP wasn't very specific with details in his question – Ofer Sadan Jun 06 '18 at 09:43
  • True, enough details weren't provided. – akash Jun 06 '18 at 09:45
0

After merging all the bits and pieces from different answers and comments, this is what I did to solve my problem:

mylist = []
infile.seek(start_byte)
for i in range(start_line_no - end_line_no + 1):
    mylist.append(list(infile.readline().strip()))

Had to calculate the start_byte though, by counting all the characters, spaces and adding 1 for each '\n'. Please let me know if there is a better way.

PKBEST
  • 321
  • 2
  • 13