0

New in Python - I have a pandas dataframe with 100 rows and 275 columns containing neighborhoods as index and venues as columns. A lot of the venues in the columns are similar and can be grouped under a wider category. The values of the table are frequencies of venues for each neighborhood. I am trying to create a new dataframe with the sums of frequencies of old columns by grouping them under categories.

i.e

df = pd.DataFrame({'Area': ['Area1', 'Area2', 'Area3'], 
                   'Pizza Place': [0.01, 0.02, 0.02],'Sandwich shop': [0.01, 0.02, 0.02],'Burger Joint': [0.01, 0.02, 0.02],'Area': ['Area1', 'Area2', 'Area3'], 
                   'Park': [0.01, 0.02, 0.02],'Elementary School': [0.01, 0.02, 0.02],'Playground': [0.01, 0.02, 0.02]})

I want to create 2 columns that will do something like this:

df['total_fast_food']=sum of frequencies for columns that contain the words:'Pizza','Sandwich','Burger' in their name
df['total_kids]=sum of frequencies for columns that contain the words:'Park','School','Play' in their name

what i tried so far :

df.loc[df['Venue Category'].str.contains('Fast Food|Pizza Place|Burger Joint', case=False)] = 'FastFood'
df_new=df_old.filter(like='Fast',axis=1)
df_new['FastFood'] = df_new.sum(axis=1)

with df.loc I can create the new columns in the existing df and remove the ones used as parameters but in the dataframe the values of the new columns are now all 0.

with filter(like=) i get the sums for all columns that have 'Fast' in their name which is good, but obviously i cannot use it for other parameters i.e 'Joint,Pizza etc'

Any thoughts pls?

DimitrisM
  • 23
  • 1
  • 6
  • Welcome to Stack Overflow and Python. Please review this to obtain a good answer: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Rich Andrews Apr 02 '19 at 13:47

1 Answers1

0

In absence of an MCVE which would include input data, an approximate answer can be conceived. Though it is unclear what axis the values are to be counted are on.

Also, category is noted, so a categorical is counted.

import pandas as pd

venue = ["Fast Food", "Pizza Place", "Burger Joint", "Fast Food", "Pizza Place", "Burger Joint", "Burger Joint", "Fast Food", "Fast Food"]
df = pd.DataFrame({"Venue":venue})
df["Venue Category"] = pd.Categorical(df['Venue'])
print(df["Venue Category"].value_counts())
Rich Andrews
  • 1,590
  • 8
  • 12