Finding smallest float in file then printing that and line above it

Question

My data file looks like this:

3.6-band 
6238
Over
0.5678
Over
0.6874
Over
0.7680
Over
0.7834

What I want to do is to pick out the smallest float and the word directly above it and print those two values. I have no idea what I'm doing. I've tried

df=open('filepath')
  for line in df:
    df1=line.split()
    df2=min(df1)

Which is my attempt at at least trying to isolate the smallest float. Problem is it's just giving me the last value. I think that's a problem with python not knowing to start over with the iteration, but again...no idea what I'm doing. I tried df2=min(df1.seek(0)) with no success, got an error saying no attribute seek. So that's what I've tried so far, I still have no idea how to print the row that would come before the smallest float. Suggestions/help/advice would be appreciated, thanks.

As a side note: this data file is an example of a larger one with similar characteristics, but the word 'Over' could also be 'Under', that's why I need to have it printed as well.

Ashwini Chaudhary · Accepted Answer · 2013-07-07T22:11:42.640

2

Store the items in a list of lists,[word,num] pairs and then apply min on that list of list. Use key parameter of min to specify the which item must be used for comparison of item.:

with open('abc') as f:
    lis = [[line.strip(),next(f).strip()] for line in f]
    minn = min(lis, key = lambda x: float(x[1]))
    print "\n".join(minn)
...     
Over
0.5678

Here lis looks like this:

[['3.6-band', '6238'], ['Over', '0.5678'], ['Over', '0.6874'], ['Over', '0.7680'], ['Over', '0.7834']]

edited Jul 07 '13 at 22:11

answered Jul 07 '13 at 21:50

Ashwini Chaudhary

244,495
58
464
504

This is very helpful. I'd prefer it to be more in the format of unutbu's output though. Thanks in any case. This is good to know as a newbie. – Matt Jul 07 '13 at 22:05
1

@Matt you can simply do `print "\n".join(minn)` to get that output. – Ashwini Chaudhary Jul 07 '13 at 22:11
Hmmm...there's so much to pick up on. After reading the docs for unutbu's solution I think I prefer yours as it is a little easier to grasp at my level. – Matt Jul 07 '13 at 22:15
Why do you call your list `lis`, atleast call it `l` or `my_list` if you're too lazy to name it after its content. Also, `minn`? Why not `smallest` or `minimum`? Good answer and I +1'd, but that's just terrible naming. – Markus Meskanen Jul 07 '13 at 22:29

score 2 · Answer 2 · edited May 23 '17 at 10:32

You could use the grouper recipe, izip(*[iterator]*2) to cluster the lines in df into groups of 2. Then, to find the minimum pair of lines, use min and its key parameter to specify the proxy to used for comparison. In this case, for every pair of lines, (p, l), we want to use the float of the second line, float(l), as the proxy:

import itertools as IT
with open('filepath') as df:
    previous, minline = min(IT.izip(*[df]*2), 
                            key=lambda (p, l): float(l))
    minline = float(minline)
    print(previous)
    print(minline)

prints

Over

0.5678

An explanation of the grouper recipe:

To understand the grouper recipe, first look at what happens if df were a list:

In [1]: df = [1, 2]

In [2]: [df]*2
Out[2]: [[1, 2], [1, 2]]

In Python, when you multiply a list by a positive integer n, you get n (shallow) copies of the items in the list. Thus, [df]*2 makes a list with two copies of df inside.

Now consider zip(*[df]*2)

The * used in zip(*...) has a special meaning. It tells Python to unpack the list following the * into arguments to be passed to zip. Thus, zip(*[df]*2) is exactly equivalent to zip(df, df):

In [3]: zip(df, df)
Out[3]: [(1, 1), (2, 2)]

In [4]: zip(*[df]*2)
Out[4]: [(1, 1), (2, 2)]

A more complete explanation of argument unpacking is given by SaltyCrane here.

Take note of what zip is doing. zip(*[df]*2) peels off the first element of both copies, (both 1's in this case), and forms the tuple, (1,1). Then it peels off the second element of both copies, (both 2's), and forms the tuple (2,2). It returns a list with these tuples inside.

Now consider what happens when df is an iterator. An iterator is sort of like a list, except an iterator is good for only a single pass. As items are pulled out the iterator, the iterator can never be rewound.

For example, a file handle is an iterator. Suppose we have a file with lines

1
2
3
4

In [8]: f = open('data')

You can pull items out of the iterator f by calling next(f):

In [9]: next(f)
Out[9]: '1\n'

In [10]: next(f)
Out[10]: '2\n'

In [11]: next(f)
Out[11]: '3\n'

In [12]: next(f)
Out[12]: '4\n'

Each time we call next(f), we get the next line from the file handle, f. If we call next(f) again, we'd get a StopIteration exception, indicating the iterator is empty.

Now let's see how the grouper recipe behaves on f:

In [14]: f = open('data')  # Notice we have to open the file again, since the old iterator is empty

In [15]: [f]*2
Out[15]: 
[<open file 'data', mode 'r' at 0xa028f98>,
 <open file 'data', mode 'r' at 0xa028f98>]

[f]*2 gives us a list with two identical copies of the same iterator f.

In [16]: zip(*[f]*2)
Out[16]: [('1\n', '2\n'), ('3\n', '4\n')]

zip(*[f]*2) peels off the first item from the first iterator, f, and then peels off the first item form the second iterator, f. But the iterator is the same f both times! And since iterators are good for a single-pass (you can never go back), you get different items each time you peel off an item. zip is calling next(f) each time to peel off an item. So the first tuple is ('1\n', '2\n'). Likewise, zip then peels off the next item from the first iterator f, and the next item from the second iterator f, and forms the tuple ('3\n', '4\n'). Thus, zip(*[f]*2) returns [('1\n', '2\n'), ('3\n', '4\n')].

That's really all there is to the grouper recipe. Above, I chose to use IT.izip instead of zip so that Python would return an iterator instead of a list of tuples. This would save a lot of memory if the file had a lot of lines in it. The difference between zip and IT.izip is explained more fully here.

This looks nice. Can you explain line 3 to me? I'd like to understand what's going on, especially with the wildcards. Thanks! — Matt, Jul 07 '13 at 22:02
Actually, I just noticed the docs link. I'll read that first. — Matt, Jul 07 '13 at 22:04
The grouper recipe you spread around is really great! It really nicely combines a lot of familiar python concepts. — PascalVKooten, May 11 '14 at 08:39
Why isn't `b=range(16); list(zip(*[b]*4))` not dividing it? I wouldn't expect duplicates? (in Python 3) — PascalVKooten, May 11 '14 at 08:53
@PascalvKooten: `range` is not a one-pass iterator. (See the error message raised by `next(b)`.) To fix, use `list(zip(*[iter(b)]*4))`. — unutbu, May 11 '14 at 08:57

score 1 · Answer 3 · edited Jul 07 '13 at 22:16

1

You can't use:

min(number)

You can only use:

min(num1, num2)

If your file looks like this:

You can use this code:

Num1 = float(file.readline())

for line in file:
    Num2 = float(line)
    Num1 = min(Num1, Num2)

If you have the "Over"s then you can skip every second line.

edited Jul 07 '13 at 22:16

Ashwini Chaudhary

244,495
58
464
504

answered Jul 07 '13 at 21:50

Guy Yagev

31
5

score 0 · Answer 4 · answered Jul 07 '13 at 21:59

You need to read all lines of the file, perhaps with File.readlines(), or a loop like you already have, and then for each line read the number (if it is a number) and compare to the "best so far" value.

It looks like you don't really need split(). What you do need to do, is check if each lines starts with a digit. If so, you can get the number with float(line). Maybe float(line.strip()) if whitespace is causing trouble. If the line doesn't start with a digit, keep it in a temporary variable. If the next line proves to offer a lower number than the best-so-far value, you can copy that temporary value into a variable for the tentative output.

score 0 · Answer 5 · answered Jul 07 '13 at 22:30

I see some interesting solutions above. I would go for this straightforward solution. There is one problem left, which is that integers might be taken like this as well. Anyone a solution for this?

    df=open('myfile.txt')
    lines=df.readlines()
    minval = 1e99
    for n,line in enumerate(lines):
        try: 
            val=float(line)  # NB! like this, also integers will be taken. 
            if val < minval:  
                minval = val
                i_min  = n  
        except:
            pass
    word = lines[i_min-1]

Finding smallest float in file then printing that and line above it

5 Answers5

Linked