2

I've been trying to find out the top-3 highest frequency restaurant names under each type of restaurant

enter image description here

The columns are:

rest_type - Column for the type of restaurant

name - Column for the name of the restaurant

url - Column used for counting occurrences

This was the code that ended up working for me after some searching:

df_1=df.groupby(['rest_type','name']).agg('count')
datas=df_1.groupby(['rest_type'], as_index=False).apply(lambda x : x.sort_values(by="url",ascending=False).head(3))
['url'].reset_index().rename(columns={'url':'count'})

The final output was as follows:

enter image description here

I had a few questions pertaining to the above code:

How are we able to groupby using rest_type again for datas variable after grouping it earlier. Should it not give the missing column error? The second groupby operation is a bit confusing to me.

What does the first formulated column level_0 signify? I tried the code with as_index=True and it created an index and column pertaining to rest_type so I couldn't reset the index. Output below:

enter image description here

Thank you

Naman Sood
  • 47
  • 1
  • 1
  • 7
  • Please share a sample of your original `df` for a [MRE](https://stackoverflow.com/help/minimal-reproducible-example) – Corralien Jun 30 '21 at 08:00
  • From MRE, *"DO NOT use images of code. Copy the actual text from your code editor, paste it into the question, then format it as code. This helps others more easily read and test your code."* and read https://stackoverflow.com/q/20109391/15239951 – Corralien Jun 30 '21 at 08:19

2 Answers2

2

You can use groupby a second time as it is present in the index which is recognized by groupby.

level_0 comes from the reset_index command because you index is unnamed.

That said, and provided I understand your dataset, I feel that you could achieve your goal more easily:

import random
df = pd.DataFrame({'rest_type': random.choices('ABCDEF', k=20),
                   'name': random.choices('abcdef', k=20),
                   'url': range(20), # looks like this is a unique identifier
                  })

def tops(s, n=3):
    return s.value_counts().sort_values(ascending=False).head(n)

df.groupby('rest_type')['name'].apply(tops, n=3)

edit: here is an alternative to format the result as a dataframe with informative column names

(df.groupby('rest_type')
   .apply(lambda x: x['name'].value_counts().nlargest(3))
   .reset_index().rename(columns={'name': 'counts', 'level_1': 'name'})
)
mozway
  • 194,879
  • 13
  • 39
  • 75
0

I have a similar case where the above query looks working partially. In my case the cooccurrence value is coming as 1 always. Here in my input data frame. enter image description here

And my query is below

top_five_family_cooccurence_df = (common_top25_cooccurance1_df.groupby('family') .apply(lambda x: x['related_family'].value_counts().nlargest(5)) .reset_index().rename(columns={'related_family': 'cooccurence', 'level_1': 'related_family'}) )

I am getting result as enter image description here

Where as The cooccurrence is always giving me 1.