i have the following dataframe and i want to normalize the column Ambience, to get one column for touristy, romantic etc. Im probably missing something obvious, so i really need some help here
business_id Ambience
0 6iYb2HFDywm3zjuRg0shjw {'touristy': False, 'hipster': False, 'romanti...
1 tCbdrRPZA0oiIYSmHG3J0w {'romantic': False, 'intimate': False, 'touris...
2 bvN78flM8NLprQ1a1y5dRg NaN
3 oaepsyvc0J17qwi8cfrOWg NaN
4 PE9uqAjdw0E4-8mjGl3wVA NaN
... ... ...
160580 D2mHoIDXx9N8mS1pGoKV9Q NaN
160581 bQX-kwVTyZgcdZGEPzce6Q NaN
160582 wvFZ06nmPmQ2-IVoPqVYLA NaN
160583 GB75wPibj3IjNauaoCxyGA NaN
160584 ngmLL5Y5OT-bYHKU0kKrYA {'romantic': False, 'intimate': False, 'classy...
I tried multiple solutions on StackOverflow, but they just dont work for me, for example:
df = df["Ambience"].apply(pd.Series)
or
df["Ambience"] = df["Ambience"].apply(
literal_eval
)
df2 = pd.json_normalize(df["Ambience"])
i tried filling in the NaN beforehand, but that didnt help also:
df.Ambience = df.Ambience.fillna(
{i: {} for i in df.index}
)
or some things i didnt understand to the fullest:
pd.concat(list(df.Ambience.map(json_normalize)))
at this point im just clueless, every solution either doesnt work and or i get some result like:
0
0 {'touristy': False, 'hipster': False, 'romanti...
1 {'romantic': False, 'intimate': False, 'touris...
2 NaN
3 NaN
4 NaN
... ...
160580 NaN
160581 NaN
160582 NaN
160583 NaN
160584 {'romantic': False, 'intimate': False, 'classy...
whats even more confusing, for another similar dataframe the df.apply(pd.Series) worked fine but a little bit slow:
business_id hours
0 6iYb2HFDywm3zjuRg0shjw {'Monday': '11:0-23:0', 'Tuesday': '11:0-23:0'...
1 tCbdrRPZA0oiIYSmHG3J0w {'Monday': '5:0-18:0', 'Tuesday': '5:0-17:0', ...
2 bvN78flM8NLprQ1a1y5dRg {'Thursday': '11:0-18:0', 'Friday': '11:0-18:0...
3 oaepsyvc0J17qwi8cfrOWg None
4 PE9uqAjdw0E4-8mjGl3wVA {'Monday': '16:0-19:0', 'Tuesday': '16:0-19:0'...
... ... ...
160580 D2mHoIDXx9N8mS1pGoKV9Q {'Monday': '9:0-19:0', 'Tuesday': '9:0-19:0', ...
160581 bQX-kwVTyZgcdZGEPzce6Q {'Monday': '10:0-18:0', 'Tuesday': '10:0-18:0'...
160582 wvFZ06nmPmQ2-IVoPqVYLA {'Monday': '0:0-0:0', 'Wednesday': '12:0-17:0'...
160583 GB75wPibj3IjNauaoCxyGA None
df = df["hours"].apply(pd.Series)
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
0 11:0-23:0 11:0-23:0 11:0-23:0 11:0-23:0 11:0-23:0 11:0-23:0 11:0-23:0
1 5:0-18:0 5:0-17:0 5:0-18:0 5:0-18:0 5:0-18:0 5:0-18:0 5:0-18:0
2 NaN NaN NaN 11:0-18:0 11:0-18:0 11:0-18:0 11:0-18:0
3 NaN NaN NaN NaN NaN NaN NaN
4 16:0-19:0 16:0-19:0 16:0-19:0 16:0-19:0 16:0-19:0 9:0-11:0 NaN
... ... ... ... ... ... ... ...
160580 9:0-19:0 9:0-19:0 9:0-19:0 9:0-19:0 9:0-19:0 9:0-19:0 9:0-19:0
160581 10:0-18:0 10:0-18:0 10:0-18:0 10:0-18:0 10:0-18:0 9:0-17:0 NaN
160582 0:0-0:0 NaN 12:0-17:0 19:0-21:0 19:0-21:0 15:30-21:0 16:0-18:0
160583 NaN NaN NaN NaN NaN NaN NaN
160584 NaN 17:0-21:0 17:0-21:0 17:0-21:0 17:0-21:0 11:30-21:0 12:0-19:30