0

I am trying to groupby a column in a pandas dataframe!

The code:

import pandas as pd

stats_reader = pd.read_csv('C:/Users/Name/PycharmProjects/Corona Stats/TimeSeries/03-20-2020.csv')
stats_clean = stats_reader.drop(['Province/State', 'Last Update', 'Latitude', 'Longitude'], axis=1)
stats_clean.reset_index(drop=True, inplace=True)
stats_clean.groupby(['Country/Region'])
stats_clean.to_csv('Clean Corona Stats.csv')

The result:

,Country/Region,Confirmed,Deaths,Recovered
0,China,67800,3133,58382
1,Italy,47021,4032,4440
2,Spain,20410,1043,1588
3,Germany,19848,67,180
4,Iran,19644,1433,6745
5,France,12612,450,12
6,"Korea, South",8652,94,1540
7,US,8310,42,0
8,Switzerland,5294,54,15
9,United Kingdom,3983,177,65
10,Netherlands,2994,106,2
11,Austria,2388,6,9
12,Belgium,2257,37,1
13,Norway,1914,7,1
14,Sweden,1639,16,16
15,US,1524,83,0
...

The desired result is to obviously group the columns by Country/Region. I would assume that it would only bring all rows of the same value together, but that dataframe stays the same with this code.

I have tried:

stats_clean.groupby(['Country/Region'])['Confirmed'].sum()

Which also produce no changes in the original dataframe. What am I missing here? I feel this should do at least something, but there is NO change no matter what I do other than dropping columns. I ran everything in jupyter just to make sure pycharm wasn't broken but I get the same results.

Luck Box
  • 90
  • 1
  • 13
  • 1
    You need to assign the result to something: `stats_clean = stats_clean.groupby(['Country/Region'])['Confirmed'].sum()`. You're getting confused with things that are done inplace and others which are not. I suggest you completely forget about using `inplace=True` and always assign back `df = df...`. There's no advantage and it's likely going to be deprecated at some point. Best to plan ahead now: https://stackoverflow.com/questions/43893457/understanding-inplace-true – ALollz Mar 22 '20 at 22:17

1 Answers1

0

I have no clue what your problem is, my exact copy(with slight modifications to your sample for reading) does exactly what groupby() is intended to do.

Sample for copy/paste(the only thing I did here was remove quotes and comma from `"Korea, South"):

,Country/Region,Confirmed,Deaths,Recovered
0,China,67800,3133,58382
1,Italy,47021,4032,4440
2,Spain,20410,1043,1588
3,Germany,19848,67,180
4,Iran,19644,1433,6745
5,France,12612,450,12
6,Korea South,8652,94,1540
7,US,8310,42,0
8,Switzerland,5294,54,15
9,United Kingdom,3983,177,65
10,Netherlands,2994,106,2
11,Austria,2388,6,9
12,Belgium,2257,37,1
13,Norway,1914,7,1
14,Sweden,1639,16,16
15,US,1524,83,0
import pandas 

# copy above sample
df = pd.read_clipboard(sep=',', index_col=0)
df1 = df.groupby(['Country/Region'])['Confirmed'].sum()

print(df1)

Country/Region
Austria            2388
Belgium            2257
China             67800
France            12612
Germany           19848
Iran              19644
Italy             47021
Korea South        8652
Netherlands        2994
Norway             1914
Spain             20410
Sweden             1639
Switzerland        5294
US                 9834
United Kingdom     3983
Name: Confirmed, dtype: int64

Since US is the only one that appears twice in this sample, it's Confirmed column will be aggregated with .sum(), the rest of the groups(Country/Regions) will remain the same.

Ukrainian-serge
  • 854
  • 7
  • 12