create new column based on next closest ranking pandas

Question

I have a list of products that are for sale on a subscription basis. the prices vary per region (about 15 regions). I'm trying to find the next closest priced product available on that day (not the cheapest on that day). My data looks like this

data = [['29/10/20', 400, 300, 2, 1], 
       ['29/10/20', 250, 400, 1, 2], 
       ['29/10/20', 600, 600, 3, 3],
       ['30/10/20', 800, 500, 3, 2]
       ['30/10/20', 200, 800, 1, 3], 
       ['30/10/20', 550, 300, 2, 1] 

df = pd.DataFrame(data, columns = ['date', 'east price', 'west price', 'east position', 'west position'])

I would like my ouput to look like

date     east_price nearest_east_price west_price nearest_west_price 
29/10/20 400        250                300        300 
29/10/20 250        250                400        300
29/10/20 600        400                600        400
30/10/20 800        550                500        300
30/10/20 250        250                800        500
30/10/20 550        250                300        300

i'm not sure whether to do this bt the position information that is available or is there another way to do this. i've been stuck on this all day. i also have the problem for the cheapest product on that day, i'd like it to return its value. can anyone help? i'm new to programming so might be missing the obvious

note, I have submitted an edit to your question. There is some inconsistency with the column names between creating dataframe and ouput. when create the `data` list of lists, you were also missing a bracket. As such, I have updated that in my answer. — David Erickson, Jun 22 '20 at 20:45
You have given us your output. Thanks. What do you exactly want to do. Please explain in words. There maybe multiple way to get it done. — wwnde, Jun 22 '20 at 21:04

David Erickson · Accepted Answer · 2020-07-02T07:59:03.600

EDIT: (2-July-2020). The OP wanted the output, so that if there are duplicate prices, then the nearest price cannot be the same.

See new solution below which has added np.where and bfill()

import pandas as pd
import numpy as np
data = [['29/10/20', 400, 300, 2, 1], 
       ['29/10/20', 250, 400, 1, 2], 
       ['29/10/20', 600, 600, 3, 3],
       ['30/10/20', 800, 500, 3, 2],
       ['30/10/20', 200, 800, 1, 3], 
       ['30/10/20', 550, 300, 2, 1]]

df = pd.DataFrame(data, columns = ['date', 'east_price', 'west_price', 'east_position', 'west_position'])

for col in df.columns:
    if '_price' in col:
        df = df.sort_values(col)
        nearest_col = f'nearest_{col}'
        df[nearest_col] = (np.where((df.shift(-1)[col] == df[col]), np.nan, df.shift(-1)[col]))
        df[nearest_col] = df[nearest_col].bfill()
df

output:

    date        east price  west price  east position   west position   nearest_east_price  nearest_west_price
0   29/10/20    400        300          2               1               550.0   400.0
5   30/10/20    550        300          2               1               600.0   400.0
1   29/10/20    250        400          1               2               400.0   500.0
3   30/10/20    800        500          3               2               NaN     600.0
2   29/10/20    600        600          3               3               800.0   800.0
4   30/10/20    200        800          1               3               250.0   NaN

(Old Answer)

Simply use .sort_values() to sort and .shift to compare to the next row.

data = [['29/10/20', 400, 300, 2, 1], 
       ['29/10/20', 250, 400, 1, 2], 
       ['29/10/20', 600, 600, 3, 3],
       ['30/10/20', 800, 500, 3, 2],
       ['30/10/20', 200, 800, 1, 3], 
       ['30/10/20', 550, 300, 2, 1]]

df = pd.DataFrame(data, columns = ['date', 'east_price', 'west_price', 'east_position', 'west_position'])

Solution #1

df = df.sort_values('east_price')
df['nearest_east_price'] = df.shift(-1)['east_price']
df = df.sort_values('west_price')
df['nearest_west_price'] = df.shift(-1)['west_price']
df

Solution #2 - Even better, if you have many columns with a similar pattern to their column name, you can loop through the column based off similar name and do it that way:

for col in df.columns:
    if '_price' in col:
        df = df.sort_values(col)
        nearest_col = f'nearest_{col}'
        df[nearest_col] = df.shift(-1)[col]

Solution #3: Consolidating Solution #2:

for col in df.columns:
    if '_price' in col: df[f'nearest_{col}'] = df.sort_values(col).shift(-1)[col]
df

output:

    date        east price  west price  east position   west position   nearest_east_price  nearest_west_price
0   29/10/20    400        300          2               1               550.0   300.0
5   30/10/20    550        300          2               1               600.0   400.0
1   29/10/20    250        400          1               2               400.0   500.0
3   30/10/20    800        500          3               2               NaN     600.0
2   29/10/20    600        600          3               3               800.0   800.0
4   30/10/20    200        800          1               3               250.0   NaN

Thank you! I wasn't aware of the shift function, saved me so much time. Is there a way if the nearest price is the same as the original price to find the next cheapest value. I know an if statement is needed but not sure where to implement this — em456, Jun 23 '20 at 12:07

create new column based on next closest ranking pandas

1 Answers1