1

I have a pandas dataframe, which looks like this:

   Country    City        POI       Type
0   NL       Amsterdam    KFC       restaurant
1   NL       Amsterdam    KFC       cafe
2   NL       Arnhem     McDonalds   fast food
3   NL       Arnhem     McDonalds   ice cream

I need to group by type column so I do not have duplicates in all other columns. In other words, I need an output like this:

   Country    City        POI       Type
0   NL       Amsterdam    KFC       restaurant, cafe
1   NL       Arnhem     McDonalds   fast food, ice cream

I tried to use group by function, but all column names disappear, and shape function shows 0 columns. Maybe there is a better way to group those values?

Here is a sample code:

import pandas as pd
import numpy as np
data = np.array([['','Country','City', 'POI', 'Type'],
            [0,"NL","Amsterdam", 'KFC', 'cafe'],
            [1,"NL","Amsterdam", 'KFC', 'restaurant'],
            [2,"NL","Arnhem", 'McDonalds', 'fast-food'],
            [3,"NL","Arnhem", 'McDonalds', 'ice cream']]
           )

initial_df = pd.DataFrame(data=data[1:,1:],
              index=data[1:,0],
              columns=data[0,1:])

final_df = initial_df .groupby( [ "Country", "City", "POI", "Type"] ).count()

print(list(final_df.columns.values))
print(final_df.shape)
jpp
  • 159,742
  • 34
  • 281
  • 339
ogull
  • 45
  • 1
  • 5

2 Answers2

1

You can group to str.join:

res = df.groupby(['Country', 'City', 'POI'])['Type'].apply(', '.join).reset_index()

print(res)

  Country       City        POI                Type
0      NL  Amsterdam        KFC    restaurant, cafe
1      NL     Arnhem  McDonalds  fastfood, icecream
jpp
  • 159,742
  • 34
  • 281
  • 339
1

Your final_df is empty because you asked pandas to group by all of your columns. If you only want to group by the columns "Type" here is what you should do:

grouped = initial_df .groupby( ["Type"] )

You then applied the count() function to the grouped dataframe. This will count the instances of non nan elelemnts in each column for each of your groups. What you want to do though is access each group. You can do that by doing so:

for name, group in grouped:
   print(name) # this prints the Type of your group
   print(group) # this prints the dataframe corrisponging to your Type

Hope this helped.

Gozy4
  • 444
  • 6
  • 11