Issue with joining repeated values/rows

Question

New to python, can't seem to understand how to proceed. After using bin and editing my data frame I was able to come up with this :

    Continents  % Renewable Country
0   Asia    (15.753, 29.227]    China
1   North America   (2.212, 15.753] United States
2   Asia    (2.212, 15.753] Japan
3   Europe  (2.212, 15.753] United Kingdom
4   Europe  (15.753, 29.227]    Russian Federation
5   North America   (56.174, 69.648]    Canada
6   Europe  (15.753, 29.227]    Germany
7   Asia    (2.212, 15.753] India
8   Europe  (15.753, 29.227]    France
9   Asia    (2.212, 15.753] South Korea
10  Europe  (29.227, 42.701]    Italy
11  Europe  (29.227, 42.701]    Spain
12  Asia    (2.212, 15.753] Iran
13  Australia   (2.212, 15.753] Australia
14  South America   (56.174, 69.648]    Brazil

Now when I set the Continents and % Renewable as a miltiindex using :

Top15 = Top15.groupby(by=['Continents', '% Renewable']).sum()

to get the following:

                            Country
Continents  % Renewable   
Asia    (15.753, 29.227]    China
         (2.212, 15.753]    JapanIndiaSouth KoreaIran
Australia   (2.212, 15.753] Australia
Europe  (15.753, 29.227]    Russian FederationGermanyFrance
        (2.212, 15.753] United Kingdom
        (29.227, 42.701]    ItalySpain
North America   (2.212, 15.753] United States
                 (56.174, 69.648]   Canada
South America   (56.174, 69.648]    Brazil

Now I would like to have a column that would give me the number of countries in each index ie:

In the 1st Row - China =1 ,

and in the 2nd Row JapanIndiaSouth KoreaIran would be 4

So in the end I want something like this :

Asia           (2.212, 15.753]     4
               (15.753, 29.227]    1

I just don't know how to get there.

Also, the numbers need to be sorted in descending order, while still keeping the index grouping in place.

piRSquared · Accepted Answer · 2017-01-11T07:56:54.900

Top15.groupby(['Continents', '% Renewable']).Country.count()

Continents     % Renewable     
Asia           (15.753, 29.227]    1
               (2.212, 15.753]     4
Australia      (2.212, 15.753]     1
Europe         (15.753, 29.227]    3
               (2.212, 15.753]     1
               (29.227, 42.701]    2
North America  (2.212, 15.753]     1
               (56.174, 69.648]    1
South America  (56.174, 69.648]    1
Name: Country, dtype: int64

To sort in the order you'd like

Top15_count = Top15.groupby(['Continents', '% Renewable']).Country.count()

Top15_count.reset_index() \
    .sort_values(
        ['Continents', 'Country'],
        ascending=[True, False]
    ).set_index(['Continents', '% Renewable']).Country

Continents     % Renewable     
Asia           (2.212, 15.753]     4
               (15.753, 29.227]    1
Australia      (2.212, 15.753]     1
Europe         (15.753, 29.227]    3
               (29.227, 42.701]    2
               (2.212, 15.753]     1
North America  (2.212, 15.753]     1
               (56.174, 69.648]    1
South America  (56.174, 69.648]    1
Name: Country, dtype: int64

thank you , but Is there a way to have it be descending order ? like in case of Asia the 4 be the first row and 1 be the next. — Aasheet Kumar, Jan 11 '17 at 07:47

score 2 · Answer 2 · edited Sep 27 '17 at 16:53

Solution with size:

What is the difference between size and count in pandas?

print (Top15.groupby(['Continents', '% Renewable']).size())
Name: Country, dtype: int64
Continents     % Renewable     
Asia           (15.753, 29.227]    1
               (2.212, 15.753]     4
Australia      (2.212, 15.753]     1
Europe         (15.753, 29.227]    3
               (2.212, 15.753]     1
               (29.227, 42.701]    2
North America  (2.212, 15.753]     1
               (56.174, 69.648]    1
South America  (56.174, 69.648]    1
dtype: int64

Use sort_values if need change order and for dataframe add reset_index, last if need MultiIndex add set_index:

print (Top15.groupby(['Continents', '% Renewable']) \
            .size() \
            .reset_index(name='COUNT') \
            .sort_values(['Continents', 'COUNT'], ascending=[True, False]) \
            .set_index(['Continents','% Renewable']).COUNT)

Continents     % Renewable     
Asia           (2.212, 15.753]     4
               (15.753, 29.227]    1
Australia      (2.212, 15.753]     1
Europe         (15.753, 29.227]    3
               (29.227, 42.701]    2
               (2.212, 15.753]     1
North America  (2.212, 15.753]     1
               (56.174, 69.648]    1
South America  (56.174, 69.648]    1
Name: COUNT, dtype: int64

thank you , but Is there a way to have it be descending order ? like in case of Asia the 4 be the first row and 1 be the next. — Aasheet Kumar, Jan 11 '17 at 07:48
Sorry, for this but i need the Asia 4 and asia 1 to be together, ie first row to be asia with value 4 then 1 , and after that Australia with row with value 3 then 2 then 1. Basically index gets first priority in sorting a-z, and then the values after that. — Aasheet Kumar, Jan 11 '17 at 07:57

Issue with joining repeated values/rows

2 Answers2