New in Python - I have a pandas dataframe with 100 rows and 275 columns containing neighborhoods as index and venues as columns. A lot of the venues in the columns are similar and can be grouped under a wider category. The values of the table are frequencies of venues for each neighborhood. I am trying to create a new dataframe with the sums of frequencies of old columns by grouping them under categories.
i.e
df = pd.DataFrame({'Area': ['Area1', 'Area2', 'Area3'],
'Pizza Place': [0.01, 0.02, 0.02],'Sandwich shop': [0.01, 0.02, 0.02],'Burger Joint': [0.01, 0.02, 0.02],'Area': ['Area1', 'Area2', 'Area3'],
'Park': [0.01, 0.02, 0.02],'Elementary School': [0.01, 0.02, 0.02],'Playground': [0.01, 0.02, 0.02]})
I want to create 2 columns that will do something like this:
df['total_fast_food']=sum of frequencies for columns that contain the words:'Pizza','Sandwich','Burger' in their name
df['total_kids]=sum of frequencies for columns that contain the words:'Park','School','Play' in their name
what i tried so far :
df.loc[df['Venue Category'].str.contains('Fast Food|Pizza Place|Burger Joint', case=False)] = 'FastFood'
df_new=df_old.filter(like='Fast',axis=1)
df_new['FastFood'] = df_new.sum(axis=1)
with df.loc I can create the new columns in the existing df and remove the ones used as parameters but in the dataframe the values of the new columns are now all 0.
with filter(like=) i get the sums for all columns that have 'Fast' in their name which is good, but obviously i cannot use it for other parameters i.e 'Joint,Pizza etc'
Any thoughts pls?