You could use the grouper recipe, izip(*[iterator]*2)
to cluster the lines in df
into groups of 2. Then, to find the minimum pair of lines, use min
and its key
parameter to specify the proxy to used for comparison. In this case, for every pair of lines, (p, l)
, we want to use the float of the second line, float(l)
, as the proxy:
import itertools as IT
with open('filepath') as df:
previous, minline = min(IT.izip(*[df]*2),
key=lambda (p, l): float(l))
minline = float(minline)
print(previous)
print(minline)
prints
Over
0.5678
An explanation of the grouper recipe:
To understand the grouper recipe, first look at what happens if df
were a list:
In [1]: df = [1, 2]
In [2]: [df]*2
Out[2]: [[1, 2], [1, 2]]
In Python, when you multiply a list by a positive integer n
, you get n
(shallow)
copies of the items in the list. Thus, [df]*2
makes a list with two copies of df
inside.
Now consider zip(*[df]*2)
The *
used in zip(*...)
has a special meaning. It tells Python to unpack the list following the *
into arguments to be passed to zip
. Thus, zip(*[df]*2)
is exactly equivalent to zip(df, df)
:
In [3]: zip(df, df)
Out[3]: [(1, 1), (2, 2)]
In [4]: zip(*[df]*2)
Out[4]: [(1, 1), (2, 2)]
A more complete explanation of argument unpacking is given by SaltyCrane here.
Take note of what zip
is doing.
zip(*[df]*2)
peels off the first element of both copies, (both 1's in this case),
and forms the tuple, (1,1). Then it peels off the second element of both copies, (both 2's), and forms the tuple (2,2). It returns a list with these tuples inside.
Now consider what happens when df
is an iterator. An iterator is sort of like a list, except an iterator is good for only a single pass. As items are pulled out the iterator, the iterator can never be rewound.
For example, a file handle is an iterator.
Suppose we have a file with lines
1
2
3
4
In [8]: f = open('data')
You can pull items out of the iterator f
by calling next(f)
:
In [9]: next(f)
Out[9]: '1\n'
In [10]: next(f)
Out[10]: '2\n'
In [11]: next(f)
Out[11]: '3\n'
In [12]: next(f)
Out[12]: '4\n'
Each time we call next(f)
, we get the next line from the file handle, f
.
If we call next(f)
again, we'd get a StopIteration exception, indicating the iterator is empty.
Now let's see how the grouper recipe behaves on f
:
In [14]: f = open('data') # Notice we have to open the file again, since the old iterator is empty
In [15]: [f]*2
Out[15]:
[<open file 'data', mode 'r' at 0xa028f98>,
<open file 'data', mode 'r' at 0xa028f98>]
[f]*2
gives us a list with two identical copies of the same iterator f
.
In [16]: zip(*[f]*2)
Out[16]: [('1\n', '2\n'), ('3\n', '4\n')]
zip(*[f]*2)
peels off the first item from the first iterator, f
, and then
peels off the first item form the second iterator, f
. But the iterator is the
same f
both times! And since iterators are good for a single-pass (you can
never go back), you get different items each time you peel off an item. zip
is
calling next(f)
each time to peel off an item. So the first tuple is
('1\n', '2\n')
. Likewise, zip
then peels off the next item from the first
iterator f
, and the next item from the second iterator f
, and forms the
tuple ('3\n', '4\n')
. Thus, zip(*[f]*2)
returns
[('1\n', '2\n'), ('3\n', '4\n')]
.
That's really all there is to the grouper recipe. Above, I chose to use IT.izip
instead of zip
so that Python would return an iterator instead of a list of tuples. This would save a lot of memory if the file had a lot of lines in it. The difference between zip
and IT.izip
is explained more fully here.