0

cuisine frequency dataframe

original dataframe

I have two dataframes of unequal size, one contains cuisine style along with its frequency in the dataset and another is the original dataset which has restaurant name and cuisine corresponding to it. I want to add a new column on the original dataset where the frequency value of each cuisine is displayed from the dataframe containing the frequency data. What is the best way to perform that? I have tried by using merge but that creates NaN values. Please suggest

I tried below code snippet suggested but it did not give me the required result. it generates freq for first row and excludes the other rows for the same 'name' column.

df = df.assign(freq=0)
# get all the cuisine styles in the cuisine df
for cuisine in np.unique(cuisine_df['cuisine_style']):
 # get the freq
     freq = cuisine_df.loc[cuisine_df['cuisine_style'] == cuisine, 
 'freq'].values
     # update value in main df
     df.loc[df['cuisine_style'] == cuisine, 'freq'] = freq

Result dataframe

Deb
  • 37
  • 4
  • Can you add the datasets to question instead of photos? – Ch3steR May 16 '20 at 04:14
  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – r.ook May 16 '20 at 06:34
  • Particularly, you might try `original_df.merge(cuisine_df, how='left')`. If that doesn't work, you will need to produce a [mre] to support your question. – r.ook May 16 '20 at 06:35
  • Huh. I see the error you mentioned and re ran my code on a DataFrame with duplicate 'name' columns and it still worked for me. Don't see any errors but It might be easier if you post the actual Dataset for people to try it on. – Tyler Wanta May 18 '20 at 01:00
  • Unfortunately, it does not give the result. I have edited my question with the code that you suggested and the result I am seeing. Mind taking a look? – Deb May 18 '20 at 02:00
  • https://www.kaggle.com/damienbeneschi/krakow-ta-restaurans-data-raw. This is the URL for the dataset. All I have done is a bit of cleaning and pre-processing – Deb May 18 '20 at 02:05
  • I updated my answer to use your data set. Check it out and let me know if you have any other questions. – Tyler Wanta May 18 '20 at 21:28

1 Answers1

0

I re ran the code on your data set and still got the same results. Here is the code I ran.

import pandas as pd
import numpy as np

# used to set 'Cuisine Style' to first 'style' in array of values
def getCusinie(row):
    arr = row['Cuisine Style'].split("'")
    return arr[1]

# read in data set. Used first col for index and drop nan for ease of use
csv = pd.read_csv('TA_restaurants_curated.csv', index_col=0).dropna()

# get cuisine values 
cuisines = csv.apply(lambda row: getCusinie(row), axis=1)

# update dataframe
csv['Cuisine Style'] = cuisines

# json obj to quickly make a new data frame with meaningless frequencies
c = {'Cuisine Style' : np.unique(csv['Cuisine Style']), 'freq': range(113)}
cuisine_df = pd.DataFrame(c)

# add 'freq' column to original Data Frame
csv = csv.assign(freq=0)

# same loop as before
for cuisine in np.unique(cuisine_df['Cuisine Style']):
    # get the freq
    freq = cuisine_df.loc[cuisine_df['Cuisine Style'] == cuisine, 
           'freq'].values

    # update value in main df
    csv.loc[csv['Cuisine Style'] == cuisine, 'freq'] = freq   

Output: enter image description here

As you can see, every column, even duplicates, have been updated. If they still are not being updated I'd check to make sure that the names are actually equal i.e. make sure there isn't any hidden spaces or anything causing issues.

You can read up on selecting and indexing DataFrames here.

Its quite long but you can pick apart what you need, when you need it

Tyler Wanta
  • 142
  • 9
  • I have tried your code but it gives me syntax error. Please suggest. File "", line 9 df.loc[df['cuisine_style' == cuisine, 'freq'] = freq ^ SyntaxError: invalid syntax – Deb May 17 '20 at 00:05
  • Sorry, missed a closing bracket after the `df['cuisine_style'`. I updated the code. – Tyler Wanta May 17 '20 at 04:41