2

Let's say I have a list L=[1.1, 1.8, 4.4, 5.2]. For some integer, n, I want to know whether L has a value val with n-1<val<n+1, and if so I want to know the index of val.

The best I can do so far is to define a generator

x = (index for index,val in enumerate(L) if n-1<val<n+1)

and check to see whether it has an appropriate value using try... except. So let's assume I'm looking for the smallest n>=0 for which such a value exists...

L=[1.1, 1.8, 4.4, 5.2]
n=0
while True:
    x = (index for index,val in enumerate(L) if n-1<val<n+1)
    try:
        index=next(x)
        break
    except StopIteration:
        n+=1
print n,index

1 0

In reality, I'm doing a more complicated task. I'll want to be able to take an n, find the first index, and if it doesn't exist, I need to do something else.

This doesn't seem like particularly clean code to me. Is there a better way? I feel like numpy probably has the answer, but I don't know it well enough.

Joel
  • 22,598
  • 6
  • 69
  • 93
  • possible duplicate of [First Python list index greater than x?](http://stackoverflow.com/questions/2236906/first-python-list-index-greater-than-x) – Antoine Pelisse Jan 19 '15 at 23:29
  • don't think so - want first greater than n-1, less than n+1 and how to handle if it doesn't exist (quickly) – Joel Jan 19 '15 at 23:36
  • It's not very clear what you expect if the item doesn't exist – Antoine Pelisse Jan 19 '15 at 23:41
  • If the item doesn't exist, in my example, I just increase `n` by 1. More generally I want to either get the index, or if it doesn't exist I want to perform some other action. – Joel Jan 19 '15 at 23:43
  • But n is an input, why would you make it an output? Anyways, you can use `next((index for index,val in enumerate(L) if n-1 – Antoine Pelisse Jan 19 '15 at 23:44
  • your question says you want to know "the" index of the value. What if there is more than one? All indicies or just the first? Also, how large is your list, if it's not large then you probably don't need np, if it's very large than memory and speed start to dictate the best answer. – Phil Cooper Jan 19 '15 at 23:48
  • Because I want to know the first `n` for which such a value exists. – Joel Jan 19 '15 at 23:48
  • @PhilCooper just the first, thanks. – Joel Jan 19 '15 at 23:49
  • Just copy the list to integer with rounding down and look for the first matching integer. If you have a .00 then is it will never satisfy the condition so need to deal with that. – paparazzo Jan 21 '15 at 15:00

5 Answers5

4

If L is sorted, you could use bisect.bisect_left to find the index i for which all L[< i] < n <= all L[>= i].

Then

if n - L[i-1] < 1.0:
    val = L[i-1]
elif L[i] - n < 1.0:
    val = L[i]
else:
    val = None     # no such value found

Edit: Depending on your data, what you want to accomplish, and how much time you want to spend writing a clever algorithm, sorting may or may not be a good solution for you; and before I see too many more O(n)s waved around, I would like to point out that his actual problem seems to involve repeatedly probing for various values of n - which would pretty rapidly amortize the initial sorting overhead - and that his suggested algorithm above is actually O(n**2).

@AntoinePelisse: by all means, let's do some profiling:

from bisect import bisect_left, bisect_right
from functools import partial
import matplotlib.pyplot as plt
from random import randint, uniform
from timeit import timeit

#blues    
density_col_lin = [
    (0.000, 0.502, 0.000, 1.000),
    (0.176, 0.176, 0.600, 1.000),
    (0.357, 0.357, 0.698, 1.000),
    (0.537, 0.537, 0.800, 1.000)
]

# greens
density_col_sor = [
    (0.000, 0.502, 0.000, 1.000),
    (0.176, 0.600, 0.176, 1.000),
    (0.357, 0.698, 0.357, 1.000),
    (0.537, 0.800, 0.537, 1.000)
]

def make_data(length, density):
    max_ = length / density
    return [uniform(0.0, max_) for _ in range(length)], max_

def linear_probe(L, max_, probes):
    for p in range(probes):
        n = randint(0, int(max_))
        for index,val in enumerate(L):
            if n - 1.0 < val < n + 1.0:
                # return index
                break

def sorted_probe(L, max_, probes):
    # initial sort
    sL = sorted((val,index) for index,val in enumerate(L))
    for p in range(probes):
        n = randint(0, int(max_))
        left  = bisect_right(sL, (n - 1.0, max_))
        right = bisect_left (sL, (n + 1.0, 0.0 ), left)
        if left < right:
            index = min(sL[left:right], key=lambda s:s[1])[1]
            # return index

def main():
    densities = [0.8, 0.2, 0.08, 0.02]
    probes    = [1, 3, 10, 30, 100]
    lengths   = [[]                   for d in densities]
    lin_pts   = [[[] for p in probes] for d in densities]
    sor_pts   = [[[] for p in probes] for d in densities]

    # time each function at various data lengths, densities, and probe repetitions
    for d,density in enumerate(densities):
        for trial in range(200):
            print("{}-{}".format(density, trial))

             # length in 10 to 5000, with log density
            length = int(10 ** uniform(1.0, 3.699))
            L, max_ = make_data(length, density)
            lengths[d].append(length)

            for p,probe in enumerate(probes):
                lin = timeit(partial(linear_probe, L, max_, probe), number=5) / 5
                sor = timeit(partial(sorted_probe, L, max_, probe), number=5) / 5
                lin_pts[d][p].append(lin / probe)
                sor_pts[d][p].append(sor / probe)

    # plot the results
    plt.figure(figsize=(9.,6.))
    plt.axis([0, 5000, 0, 0.004])

    for d,density in enumerate(densities):
        xs = lengths[d]
        lcol = density_col_lin[d]
        scol = density_col_sor[d]

        for p,probe in enumerate(probes):
            plt.plot(xs, lin_pts[d][p], "o", color=lcol, markersize=4.0)
            plt.plot(xs, sor_pts[d][p], "o", color=scol, markersize=4.0)

    plt.show()

if __name__ == "__main__":
    main()

which results in

enter image description here

x-axis is number of items in L, y-axis is amortized time per probe; green dots are sorted_probe(), blue are linear_probe().

Conclusions:

  • runtimes for both functions are remarkably linear with respect to length
  • for a single probe into L, presorting is about 4 times slower than iterating
  • the crossover point seems to be about 5 probes; for fewer than that, linear search is faster, for more, presorting is faster.
Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
  • I can't assume sorted. – Joel Jan 19 '15 at 23:36
  • 1
    @Joel: then either you sort it, or you have to compare every list item. – Hugh Bothwell Jan 19 '15 at 23:39
  • interesting that you would recommend using a O(n lg n) solution before O(n) – Antoine Pelisse Jan 19 '15 at 23:40
  • 1
    @AntoinePelisse: depending on the relative constant and the length of his list, yes, the "slow" method may well be faster. – Hugh Bothwell Jan 19 '15 at 23:42
  • @HughBothwell Could you provide an example? I'd be very curious to investigate one? Sorting means that you have to read all values from the list while going one-by-one is going to be cache-optimal and most of the time won't even go for a full read. – Antoine Pelisse Jan 19 '15 at 23:47
  • @HughBothwell I am really missing why you feel O(n ln n) is at all necessary. Maybe you answered before OP said he just wanted the first element that satified the result. Worst case should be O(n) Average should be O(n/2) since you are just looking for one (first satisfying value) I would -1 because I think this is the wrong approach but then I'd +1 because most people overlook bisect and it's nice to have it called out. – Phil Cooper Jan 20 '15 at 01:06
  • Of course, if he wants to probe multiple times (let's call it `p`), he going to end-up with: `O((n + p) lg n)` for sorting against `O(np)` for simple search. It's important to describe the problem correctly :-) – Antoine Pelisse Jan 21 '15 at 18:57
2

Here's a solution that doesn't rely on try... except and is relatively easy-to-read. As a result it "feels cleaner" to me, but that's always going to have an element of subjectivity to it.

def where_within_range( sequence, lower, upper ):
    for index, value in enumerate( sequence ):
        if lower < value < upper: return index

L = [ 1.1, 1.8, 4.4, 5.2 ]

import itertools
for n in itertools.count():
    index = where_within_range( L, n - 1, n + 1 )
    if index != None: break

print n, index

If you'd prefer to avoid the repeated-function-call overhead you could instead do it as follows, which once again uses the StopIteration exception but by using itertools.count and a return statement, (again, "somehow") ends up seeming cleaner to me. Perhaps it's because there's only one statement in each part of the try...except... clause (there's not much rational basis for that feeling, admittedly).

import itertools
def find_joel_root( sequence ):
    for n in itertools.count():
        solutions = ( index for index, value in enumerate( sequence ) if n - 1 < value < n + 1 )
        try: return n, next( solutions )
        except StopIteration: pass

L = [ 1.1, 1.8, 4.4, 5.2 ]
n, index = find_joel_root( L )
print n, index
jez
  • 14,867
  • 5
  • 37
  • 64
2

I have an interesting thought, by using defaultdict and build the index with the values (n-1) and (n+1), it will require looping the list once, and thereafter just compare the key/values, like this:

from collections import defaultdict

L = [1.1, 1.8, 4.4, 5.2]

x = defaultdict(dict)
for idx, item in enumerate(L):
    x[int(item)] = {int(item-1): item-1, int(item+1): item+1, 'index':idx}

Usage:

n = 5

x[n].get(n-1) < n < x[n].get(n+1) and x[n]['index']
Out[8]: 3

n = 2

x[n].get(n-1) < n < x[n].get(n+1) and x[n]['index']
Out[10]: False

Explanation:

As if:

1) True and Index returns Index

2) False and Index will return False

Because you are going to input n as an integer, if first part is True it will return the second part index value. If first part fails, it returns False.

This will return the LAST occurrence of n, in case you need the FIRST occurrence of n, just reversed the list and Index:

...
l = len(L)
for idx, item in enumerate(reversed(L)):
    x[int(item)] = {int(item-1): item-1, 
                    int(item+1): item+1, 
                    'index': l-idx-1}
...
Anzel
  • 19,825
  • 5
  • 51
  • 52
1

Now,when I think,that I finally understand your task : just find the minimum value in the array and it's index - n will be equal to cell(mininum). Or even simpler :

n,index = int(min(L)),L.index(min(L))
ig-melnyk
  • 2,769
  • 2
  • 25
  • 35
1

l can be a list or numpy array:

next(((i,v) for i,v in enumerate(l) if n-1<v<n+1))

uses generators and stops on the first value.

Phil Cooper
  • 5,747
  • 1
  • 25
  • 41