0

I'm writing an Alexa skill to tell me when the next buses are due to take me to work. I have reached a point where I have a pandas dataframe with the necessary information. It looks like this:

   BusStop    1st    2nd    3rd    4th    5th BusLine
10  myStop  20:05  20:16  20:28  20:38  20:52       A
3   myStop  16:07  17:07  18:13  19:12  20:12       E
15  myStop  18:26  18:36  18:46  18:58  19:25       K

But I want to transform it to include only earliest times so Alexa can tell me "The A bus is coming in 5 minutes, the K bus in 20 minutes" or something to that effect.

   BusStop    1st  BusLine
10  myStop  16:07   E
3   myStop  17:07   E
15  myStop  18:13   E

I have a way of doing this but it seems quite clumsy and wondered if there is a better way to do this. I have it working with the below code:

ranked_buses_to_work = pd.DataFrame()

for i in [ '1st','2nd','3rd','4th','5th']:
    temp_df = buses_to_work_df[['BusStop', i, 'BusLine']]
    temp_df.columns = ['BusStop', 'BusTime', 'BusLine']
    ranked_buses_to_work = ranked_buses_to_work.append(temp_df)
    ranked_buses_to_work  = ranked_buses_to_work .sort_values(by=['BusTime'], inplace=True)

Is there a better way to do this?

user3535074
  • 1,268
  • 8
  • 26
  • 48
  • 1
    why A have 16:07 – BENY Dec 17 '18 at 22:00
  • [Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying.](https://stackoverflow.com/questions/36489576/why-does-concatenation-of-dataframes-get-exponentially-slower) – Parfait Dec 17 '18 at 22:01
  • Are you sure your expected output is correct? – Brian Dec 17 '18 at 22:05
  • The output was correct. The A at 16:07 was just a human error (I edited the first table to show what I wanted - I hadn't got working code at that point). – user3535074 Dec 17 '18 at 22:25

2 Answers2

2
import pandas as pd
from io import StringIO

# Sample data
df = pd.read_fwf(StringIO(
"""BusStop    1st    2nd    3rd    4th    5th BusLine
myStop  20:05  20:16  20:28  20:38  20:52       A
myStop  16:07  17:07  18:13  19:12  20:12       E
myStop  18:26  18:36  18:46  18:58  19:25       K
"""), index=False)


# transform the wide dataframe into a tall dataframe sorted by time
dfm = df.melt(id_vars = ["BusStop", "BusLine"], var_name = 'order', value_name="departure") \
         .sort_values('departure')

# set the currrent time and number of entries to report
# ProabblyuUse se proper date time variables instead
time_now = '16:10'
how_many = 5


# select entries > time_now and get the number wanted        
dfm[dfm.departure > time_now][['BusLine', 'departure']].head(how_many) 


#Out[156]: 
#  BusLine departure
#4       E     17:07
#7       E     18:13
#2       K     18:26
#5       K     18:36
#8       K     18:46
Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19
1

Not clear what the actual logic of the desired output is, but this works:

Find the 3 smallest values in your dataframe (using numpy):

import numpy as np
idx = df.values.ravel().argsort()[:3]

Recover the index 'coordinates' of the smallest values

idxa = np.unravel_index(idx, df.shape)

Create the output column using zip to build the index in a friendly format for pandas:

df['1st'] = [df.iloc[x] for x in list(zip(idxa[0], idxa[1]))]

Create the correct 'BusLine' column (the one matching the min time)

df['BusLine'] = [df.iloc[x,-2] for x in idxa[0]]

Present the result in a clean format

ans = df.iloc[:,[0,-1,-2]]

Output:

    BusStop   1st BusLine
10  myStop  16:07       E
3   myStop  17:07       E
15  myStop  18:13       E
Yuca
  • 6,010
  • 3
  • 22
  • 42
  • That's certainly a way I wouldn't have thought to do this! Thanks! – user3535074 Dec 17 '18 at 22:29
  • this is quite the 'artisan' approach and I'm sure there are multiple ways to do this, so hopefully other people share their views! – Yuca Dec 17 '18 at 22:32