12

I need to get the value of the previous line in a file and compare it with the current line as I'm iterating through the file. The file is HUGE so I can't read it whole or randomly accessing a line number with linecache because the library function still reads the whole file into memory anyway.

EDIT I'm so sorry I forgot the mention that I have to read the file backwardly.

EDIT2

I have tried the following:

 f = open("filename", "r")
 for line in reversed(f.readlines()): # this doesn't work because there are too many lines to read into memory

 line = linecache.getline("filename", num_line) # this also doesn't work due to the same problem above. 
Lim H.
  • 9,870
  • 9
  • 48
  • 74
  • 1
    You mean just the immediately preceding line? You can't just save it as you go? – Fred Larson Jun 28 '13 at 20:29
  • 2
    You would be more likely to get help, if you showed us what you've written so far. – That1Guy Jun 28 '13 at 20:30
  • Could you provide what you've tried? Looping over a file line by line is possible, and assigning the line to a variable is possible, so what exactly is going wrong? By the way, how big is HUGE? – ChrisP Jun 28 '13 at 20:30
  • I can't because the file has too many lines so when I tried to put them all into a list for example, python complains that it cannot allocate the necessary memory for the list. – Lim H. Jun 28 '13 at 20:32
  • 3
    If you're accessing the lines sequentially anyway, have you tried storing two consecutive lines at a time? – Diana Jun 28 '13 at 20:33
  • @ChrisP I have no idea how big is HUGE; that's another problem. It's an exercise for a class and I need to run my script against my teacher's data. It gets through most of the data sets until it hits this big file. I'll try to show the code. – Lim H. Jun 28 '13 at 20:34
  • @Diana I'm sorry for the dumb question but how do you read two consecutive lines to store at a time? – Lim H. Jun 28 '13 at 20:37
  • @Lim, I posted my solution as an answer. – Diana Jun 28 '13 at 20:56
  • What do you mean by "*read the file backwardly*" ? – Jon Clements Jun 28 '13 at 20:58
  • I meant to read from the last line to the beginning of the file. – Lim H. Jun 28 '13 at 21:01
  • Is it part of your assignment to iterate over the lines in reverse? There's not any really easy way to do that in Python (you can do it using low-level file operations and seeks, but it will be messy). Can you adjust your algorithm to work with a forward iteration? – Blckknght Jun 28 '13 at 21:08
  • @Blckknght Yes, I saw this question http://stackoverflow.com/questions/5896079/python-head-tail-and-backward-read-by-lines-of-a-text-file/5896210#5896210 before I came here but it's indeed messy. I'll adjust my algorithm but I'm still curious whether there are any built-in features to accomplish the same task. – Lim H. Jun 28 '13 at 21:13
  • Does "reading the file backwards" mean "read the bytes in the file in reverse order" or "read the characters in the file in reverse order" or "read the last line, then the second-last line, and so on, where 'lines' are separated by some specified line-terminator"? That's a complicated task whichever way you define it if you can't load the whole file at once, and there's a good chance that there's a better way to do what you're trying to do. Why do you think you have to read it "backwards"? – Henry Keiter Jun 28 '13 at 21:32
  • @HenryKeiter I meant "read the last line, then the second-last line". Please excuse my explanation (it's 5 am at my place); I assumed given the context that I'm trying to read lines, the phrase would be self explanatory. There are many good ways to accomplish my bigger task as a whole, but as my code is working in 90% cases, I was hoping I can tweak it a little bit to get it work on this one as well. – Lim H. Jun 28 '13 at 21:44

3 Answers3

23

Just save the previous when you iterate to the next

prevLine = ""
for line in file:
    # do some work here
    prevLine = line

This will store the previous line in prevLine while you are looping

edit apparently OP needs to read this file backwards:

aaand after like an hour of research I failed multiple times to do it within memory constraints

Here you go Lim, that guy knows what he's doing, here is his best Idea:

General approach #2: Read the entire file, store position of lines

With this approach, you also read through the entire file once, but instead of storing the entire file (all the text) in memory, you only store the binary positions inside the file where each line started. You can store these positions in a similar data structure as the one storing the lines in the first approach.

Whever you want to read line X, you have to re-read the line from the file, starting at the position you stored for the start of that line.

Pros: Almost as easy to implement as the first approach Cons: can take a while to read large files

Community
  • 1
  • 1
Stephan
  • 16,509
  • 7
  • 35
  • 61
  • Thanks so much. But I forgot to mention that I have to read the file backwardly. – Lim H. Jun 28 '13 at 20:56
  • Magic. I'm new to python, so although I knew file was iterable, using [::-1] just never crossed my mind. Thank you. – Lim H. Jun 28 '13 at 21:02
  • 1
    You can't slice a file, so your edited code doesn't make any sense. – Blckknght Jun 28 '13 at 21:02
  • @Stephan: I just tried this and I get a type error at the `[::-1]` part: `TypeError: 'file' object has no attribute '__getitem__'`. Is this a Python 2.7 Vs Python 3.0 issue? Or did you open the file in a specific way? I'm using Python 2.7 and I open my file as `my_file = open("file.txt","r")` – Diana Jun 28 '13 at 21:04
  • @Stephan sorry I took your words for granted but I just tried and found out that you can't slice a file. – Lim H. Jun 28 '13 at 21:04
  • Using `reversed` will need the whole data of the file in memory at once (you're doing it via `readlines`). The question says that the file is too large to do that. Of course, I'm not sure that there is any good way to iterate over a file in reverse, but that seems to be what the questioner wants. – Blckknght Jun 28 '13 at 21:06
  • @LimH. the alternative to this memory intensive solution is to use python to create a reversed file and read through it normally, which you should be able to do – Stephan Jun 28 '13 at 21:07
  • @Stephan How would you create a reversed file in this situation? – Lim H. Jun 28 '13 at 21:20
  • @LimH. I am not on a roll today, i'm just gonna stop embarrassing myself – Stephan Jun 28 '13 at 21:33
  • @LimH. I'm done now, I posted a link to the best answer I could find after a ton of googling – Stephan Jun 28 '13 at 21:40
  • @Stephan NP. You've been a big help. Your link is very informative, but if it comes to the low level of file reading using seek, I think I'm gonna change my approach entirely. – Lim H. Jun 28 '13 at 21:46
5

@Lim, here's how I would write it (reply to the comments)

def do_stuff_with_two_lines(previous_line, current_line):
    print "--------------"
    print previous_line
    print current_line

my_file = open('my_file.txt', 'r')

if my_file:
    current_line = my_file.readline()

for line in my_file:

    previous_line = current_line
    current_line = line

    do_stuff_with_two_lines(previous_line, current_line)
Diana
  • 1,301
  • 1
  • 9
  • 21
  • Thank YOU. I'm terribly sorry but I forgot to mention that I have to read the file backwardly. – Lim H. Jun 28 '13 at 20:57
2

I'd write a simple generator for the task:

def pairwise(fname):
    with open(fname) as fin:
        prev = next(fin)
        for line in fin:
            yield prev,line
            prev = line

Or, you can use the pairwise recipe from itertools:

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.izip(a, b)
mgilson
  • 300,191
  • 65
  • 633
  • 696