0

I need some help.

Let's say I have the below dataframe called venues_df

enter image description here

I also have this function: return_most_common_venues

def return_most_common_venues(row, 4):
    # Selects the row values
    row_values = row.iloc[1:]

    # Sorts the selected row values
    row_values_sorted = row_values.sort_values(ascending=False)

    # Returns the column name of the first 4 sorted values 
    return row_values_sorted.index.values[0:4]

If I apply my function on the first row:

return_most_common_venues(venues_df.iloc[0, :], 4)

The result will be an array (the below tables are for illustration purposes):

array (['Bar', 'Restaurant', 'Park', 'Gym'])

enter image description here

The problem is when I apply my function to the second row.

return_most_common_venues(venues_df.iloc[1, :], 4)

I will get

array(['Park', 'Restaurant', 'Gym', 'SuperMarket'])

enter image description here

What I need is for it to return:

array (['Bar', 'Restaurant', 'Not Available', 'Not Available'])

If the value is zero I need it to return 'Not Available' instead of the column names "Gym' and 'SuperMarket'

How can I modify my function to return what i need?

Thank you for your help!

Efren

Efren M
  • 67
  • 5

2 Answers2

0

I suggest the following based on this question:

import pandas as pd

def return_most_common_venues(row, nb_return_values=4):
    # Selects the row values
    row_values = row.iloc[1:]

    # Sorts the selected row values
    row_values_sorted = row_values.sort_values(ascending=False)

    # Returns the column name of the first 4 sorted values
    output = list(row_values_sorted.index.values[0:nb_return_values])\
                  + ['Not available'] * (nb_return_values - len(row_values_sorted.index))
    return output


df = pd.DataFrame([[7, 4, 1, 5, 9, 3], [5, 0, 0, 8, 0, 0]], 
                  columns=["Restaurant", "Gym", "Supermarket", "Park", "Bar", "Café"],
                  index=[0,1])

return_most_common_venues(df.iloc[1, :], 4)

And the result is :

 ['Park', 'Not available', 'Not available', 'Not available']
Raphaele Adjerad
  • 1,117
  • 6
  • 12
0
def return_most_common_venues(df, row, cols):

    # Selects the row values
    row_values = df.loc[row]

    # Sorts the selected row values
    row_values_sorted = row_values[np.argsort(row_values)[-cols:]][::-1]

    # Returns the column name of the first 4 sorted values 
    return [index if value > 0 and value != np.nan else "Not Available" for index, value in zip(row_values_sorted.index, row_values_sorted.values)]

return_most_common_venues(df, row=1, cols=4)

Output:

['Park', 'Restaurant', 'Not Available', 'Not Available']
TH14
  • 622
  • 10
  • 24
  • Thanks! so much TH14. That did it. I have one follow up question though. After sorting the values, what do these elements do [-cols:]][::-1] on the third line of code? – Efren M Apr 26 '20 at 01:41
  • My pleasure! if you break it apart, [-cols:] returns the 4 largest values in ascending order, the [::-1] part just reverses the order. https://stackoverflow.com/questions/16486252/is-it-possible-to-use-argsort-in-descending-order – TH14 Apr 26 '20 at 02:19