0

Be the following python pandas DataFrame.

| date          | days       | country   |
| ------------- | ---------- | --------- |
| 2022-02-01    | 10         |  Spain    |
| 2022-02-02    | 20         |  Spain    |
| 2022-02-01    | 10         |  Italy    |
| 2022-02-03    | 41         |  France   |
| 2022-02-03    | 13         |  Germany  |
| 2022-02-04    | 11         |  Italy    |
| 2022-02-04    | 1          |  UK       |
| 2022-02-05    | 20         |  UK       |
| 2022-02-04    | 50         |  Spain    |
| 2022-02-04    | 11         |  Portugal |

I want to get a ranking by country according to the number of rows from that country that appear.

| country          | count       |
| ---------------- | ----------- |
| Spain            | 3           |
| Italy            | 2           |
| UK               | 2           |
| France           | 1           |
| Germany          | 1           |
| Portugal         | 1           |

Finally I want to return the countries from most to least number of rows in a string array.

return: countries = ['Spain', 'Italy', 'UK', 'France', 'Germany', 'Portugal']

The aim is that from this array you can make subsets of the top 3 and analyse the data for each one.

df1 = df[ (df['country'] == countries[0]) ].reset_index(drop=True)
df2 = df[ (df['country'] == countries[1]) ].reset_index(drop=True)
df3 = df[ (df['country'] == countries[2]) ].reset_index(drop=True)

The dataframe may vary by month in the country classification, which is why I want to do it this way.

Carola
  • 366
  • 4
  • 18

0 Answers0