0

I know this should be easy but it's driving me mad...

I am trying to turn a dataframe into a grouped dataframe.

df outputs:

    Postcode    Borough             Neighbourhood
0   M3A         North York          Parkwoods
1   M4A         North York          Victoria Village
2   M5A         Downtown Toronto    Harbourfront
3   M5A         Downtown Toronto    Regent Park
4   M6A         North York          Lawrence Heights
5   M6A         North York          Lawrence Manor
6   M7A         Queen's Park        Not assigned
7   M9A         Etobicoke           Islington Avenue
8   M1B         Scarborough         Rouge
9   M1B         Scarborough         Malvern
10  M3B         North York          Don Mills North
...

I want to make a grouped dataframe where the Neighbourhood is grouped by Postcode and all neighborhoods then become a concatenated string of Neighbourhoods as grouped by Postcode... something like:

    Postcode    Borough             Neighbourhood
0   M3A         North York          Parkwoods
1   M4A         North York          Victoria Village
2   M5A         Downtown Toronto    Harbourfront, Regent Park
...

I am trying to use:

df.groupby(['Postcode'])['Neighbourhood'].apply(lambda strs: ', '.join(strs))

But this does not return a new dataframe .. it outputs the same original dataframe when I use df after running.

if I use:

df = df.groupby(['Postcode'])['Neighbourhood'].apply(lambda strs: ', '.join(strs))

it turns df into an object?

M A
  • 73
  • 1
  • 2
  • 10
  • https://stackoverflow.com/questions/18138693/replicating-group-concat-for-pandas-dataframe – Matthew Barlowe May 30 '19 at 17:45
  • thanks.. looks like I'm on the right track but I still can't get the dataframe to appear correct. ```df.groupby('Postcode').agg({'Neighbourhood':lambda x:', '.join(x)})``` and then ```df``` still returns an ungrouped dataframe... – M A May 30 '19 at 18:40
  • if you don't assign the new dataframe to a new variable it won't. I'm pretty sure group by isn't done in place – Matthew Barlowe May 30 '19 at 18:41
  • So it looks like all that will do is create a new dataframe with Postcode as the index but the Neighbourhood looks correct.. need to figure out how to get it back into the original dataframe now.. – M A May 30 '19 at 18:56
  • add `.reset_index()` to the end of your chain. Docs can be found [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html) – Matthew Barlowe May 30 '19 at 18:57

1 Answers1

1

Use this code

new_df = df.groupby(['Postcode', 'Borough']).agg({'Neighbourhood':lambda x:', '.join(x)}).reset_index()

reset_index() will take your group by columns out of the index and return it as a column to the dataframe and create a new integer index.

Matthew Barlowe
  • 2,229
  • 1
  • 14
  • 24