1

Let's say I have a dataframe with a column of closing prices, and a separate (not included in the dataframe) list of max's like so:

Closes      Max's
1           3
2           5
3           7
4           6
5           2
4
3
2
1

What is a pythonic way to add another column to the dataframe, where it grabs the next biggest number, comparing each close to the numbers in the list? Such that, the expected output would be:

Closes      Next_Biggest
1           2
2           3
3           5
4           5
5           6
4           5
3           5
2           3
1           2

Something pseudo like:

df['Next_Biggest'] = i where df['Closes'] > i and < i for i in Max's 

...or maybe to sort the Max's list from smallest to largest and then somehow loop through each Closes and compare one at a time to see if it's less than the current Max it's trying to compare to? Help! Thanks!

wildcat89
  • 1,159
  • 16
  • 47

3 Answers3

2

One possible solution (for small datasets) is to find the minimum value of all the values in Max's that are greater than each value in Closes:

closes = [1, 2, 3, 4, 5, 4, 3, 2, 1]
maxs = [3, 5, 6, 7, 2]
nextbig = [min([m for m in maxs if m > c]) for c in closes]
print(nextbig)

Output:

[2, 3, 5, 5, 6, 5, 5, 3, 2]

A more efficient alternative would be a nested loop using a sorted Max's list:

closes = [1, 2, 3, 4, 5, 4, 3, 2, 1]
maxs = [3, 5, 6, 7, 2]
nextbig = [c for c in closes]
maxs.sort()
for m in maxs:
    for i, c in enumerate(closes):
        if m > c and nextbig[i] == c:
            nextbig[i] = m
print(nextbig)

Output:

[2, 3, 5, 5, 6, 5, 5, 3, 2]
Nick
  • 138,499
  • 22
  • 57
  • 95
1

Use a nested loop like so:

import pandas as pd
# example dataframe without maxs inside the frame
d = {'Closes': [1, 2, 3, 4, 5, 4, 3, 2, 1]}
# maxs as a list
maxs = [3,5,7,6,2]
# sort the list from least to greatest
maxs.sort()
# have a container for new column
nextBiggest = []
# set up data frame
dataFrame = pd.DataFrame(data=d)
# convert data frame object to a list
dList = dataFrame.values.tolist()
# using a nested loop, find next biggest and put into nextBiggest list
for element in dList:
    for value in element:
        for max in maxs:
            if max > value:
                nextBiggest.append(max)
                break
# add nextBiggest list as a column to the dataframe
dataFrame['Next Biggest'] = nextBiggest
# display results
print(dataFrame)

Output:

   Closes  Next Biggest
0       1             2
1       2             3
2       3             5
3       4             5
4       5             6
5       4             5
6       3             5
7       2             3
8       1             2
Jamin
  • 1,362
  • 8
  • 22
1

Here's something using generator expressions that shouldn't have problems with very large lists. It basically creates two generators, one that cycles over its sequence every time the other generator advances once, and which(the former) terminates the first time the matching condition is found. (The top answer in the reference below is very helpful)

def find_next_largest(closes, maxes):
    """Searches through one list trying to find the next largest value
    from another list."""
    maxes = sorted(maxes)
    foo = (i for i in closes)  #generator for closes, the very large list
    for x in range(len(closes)):
        test = next(foo)
        bar = next((test, val) for val in maxes if val > test) #see SO reference below
        yield bar

To use this function:

closes = [1,2,3,4,5,4,3,2,1]
maxes = [2,3,4,6,7]
zed = find_next_largest(closes, maxes)
result = []
while True:
    try:
        result.append(next(zed))
    except StopIteration:
        break

This gives the following result:

[(1, 2), (2, 3), (3, 4), (4, 6), (5, 6), (4, 6), (3, 4), (2, 3), (1, 2)]

Based mainly on: Get the first item from an iterable that matches a condition

neutrino_logic
  • 1,289
  • 1
  • 6
  • 11