pdDataFrame getting earliest times out of multiple columns and rows of data

Question

I'm writing an Alexa skill to tell me when the next buses are due to take me to work. I have reached a point where I have a pandas dataframe with the necessary information. It looks like this:

   BusStop    1st    2nd    3rd    4th    5th BusLine
10  myStop  20:05  20:16  20:28  20:38  20:52       A
3   myStop  16:07  17:07  18:13  19:12  20:12       E
15  myStop  18:26  18:36  18:46  18:58  19:25       K

But I want to transform it to include only earliest times so Alexa can tell me "The A bus is coming in 5 minutes, the K bus in 20 minutes" or something to that effect.

   BusStop    1st  BusLine
10  myStop  16:07   E
3   myStop  17:07   E
15  myStop  18:13   E

I have a way of doing this but it seems quite clumsy and wondered if there is a better way to do this. I have it working with the below code:

ranked_buses_to_work = pd.DataFrame()

for i in [ '1st','2nd','3rd','4th','5th']:
    temp_df = buses_to_work_df[['BusStop', i, 'BusLine']]
    temp_df.columns = ['BusStop', 'BusTime', 'BusLine']
    ranked_buses_to_work = ranked_buses_to_work.append(temp_df)
    ranked_buses_to_work  = ranked_buses_to_work .sort_values(by=['BusTime'], inplace=True)

Is there a better way to do this?

[Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying.](https://stackoverflow.com/questions/36489576/why-does-concatenation-of-dataframes-get-exponentially-slower) — Parfait, Dec 17 '18 at 22:01
The output was correct. The A at 16:07 was just a human error (I edited the first table to show what I wanted - I hadn't got working code at that point). — user3535074, Dec 17 '18 at 22:25

score 2 · Accepted Answer · answered Dec 17 '18 at 22:28

import pandas as pd
from io import StringIO

# Sample data
df = pd.read_fwf(StringIO(
"""BusStop    1st    2nd    3rd    4th    5th BusLine
myStop  20:05  20:16  20:28  20:38  20:52       A
myStop  16:07  17:07  18:13  19:12  20:12       E
myStop  18:26  18:36  18:46  18:58  19:25       K
"""), index=False)


# transform the wide dataframe into a tall dataframe sorted by time
dfm = df.melt(id_vars = ["BusStop", "BusLine"], var_name = 'order', value_name="departure") \
         .sort_values('departure')

# set the currrent time and number of entries to report
# ProabblyuUse se proper date time variables instead
time_now = '16:10'
how_many = 5


# select entries > time_now and get the number wanted        
dfm[dfm.departure > time_now][['BusLine', 'departure']].head(how_many) 


#Out[156]: 
#  BusLine departure
#4       E     17:07
#7       E     18:13
#2       K     18:26
#5       K     18:36
#8       K     18:46

I figured that there would be something like the melt function - that's what I was looking for. — user3535074, Dec 17 '18 at 22:34

Yuca · Answer 2 · 2018-12-17T22:27:34.947

1

Not clear what the actual logic of the desired output is, but this works:

Find the 3 smallest values in your dataframe (using numpy):

import numpy as np
idx = df.values.ravel().argsort()[:3]

Recover the index 'coordinates' of the smallest values

idxa = np.unravel_index(idx, df.shape)

Create the output column using zip to build the index in a friendly format for pandas:

df['1st'] = [df.iloc[x] for x in list(zip(idxa[0], idxa[1]))]

Create the correct 'BusLine' column (the one matching the min time)

df['BusLine'] = [df.iloc[x,-2] for x in idxa[0]]

Present the result in a clean format

ans = df.iloc[:,[0,-1,-2]]

Output:

    BusStop   1st BusLine
10  myStop  16:07       E
3   myStop  17:07       E
15  myStop  18:13       E

edited Dec 17 '18 at 22:27

answered Dec 17 '18 at 22:05

Yuca

6,010
3
22
42

That's certainly a way I wouldn't have thought to do this! Thanks! – user3535074 Dec 17 '18 at 22:29
this is quite the 'artisan' approach and I'm sure there are multiple ways to do this, so hopefully other people share their views! – Yuca Dec 17 '18 at 22:32

pdDataFrame getting earliest times out of multiple columns and rows of data

2 Answers2

Output: