0

enter image description here

My dataset has these columns : company, valuation, date joined, country, city, industry, continent, and year. I want to get the industry name and company number that has the highest number of company in each continent. I don't have number of company per industry per continent column because I don't think that's the right thing to do.

I already tried this code : grouped_df = unicorn_df.groupby(['Continent','Industry']).count() grouped_df

and here's the output I expected : enter image description here I want an output that will show this :

Continent Industry Company Africa Fintech 3 Asia E-commerce & direct-to-consumer 57 Europe Fintech 53 etc

What should I do next? Notice that here we need to count the company number first because it has no column with total company number per industry per continent.

Thank you so much!

  • For remove unnecessary columns use `grouped_df = unicorn_df.groupby(['Continent','Industry'])['Company'].count() ` – jezrael Jan 31 '23 at 11:19
  • Yes, but what I want to get is the industry that has highest number of company per continent. For example : Asia Fintech 58, etc. How to do that? – Eva Ananda Jan 31 '23 at 11:26
  • Then check [this](https://stackoverflow.com/questions/15705630/get-the-rows-which-have-the-max-value-in-groups-using-groupby) solution – jezrael Jan 31 '23 at 11:28
  • As I mentioned in my question above, I don't have specific column for the company number. I've read that solution before but the problem is a bit different with mine. – Eva Ananda Jan 31 '23 at 11:49
  • So not working `df = grouped_df.loc[grouped_df.groupby('Continent')['Company'].idxmax()]` ? – jezrael Jan 31 '23 at 11:50
  • 1
    it works!!! thanks a bunch! however, i still don't really understand about that, will learn later. again, thanks! – Eva Ananda Jan 31 '23 at 12:29

0 Answers0