0

I have this dataframe. How I got this dataframe was due to using .to_dict(). It manages to split the initial dictionary into individual columns on key:value pairs. However some keys has a dictionary like value. For instance like the Ambience column here.

I believe the initial dictionary is something like this

attribute = {
    "AcceptsInsurance": "null",
    "AgesAllowed": "null',
    "Ambience": {
        'touristy': False,
        'hipster': False,
        'romantic': False,
        'divey': False,
        'intimate': False
      }
    } ... 

To be exact, enter image description here

The column value are in a dictionary like object but are not dictionary. The column are in object type.

+---------------+--------------------------------------------------------------------------------------------+                  
|    business_id|                                                                                    Ambience|  
+---------------+--------------------------------------------------------------------------------------------+
|6iYb2HFDywm3zjw|{'touristy': False, 'hipster': False, 'romantic': False, 'divey': False, 'intimate': False} | 
|drRPZA0oiIYSmqs|{'romantic': False, 'intimate': False, 'classy': False, False, 'casual': True}              |  
+---------------+-------------+------------------------------------------------------------------------------+

I would like to hence further split it and create new column for each of the element of the key:value pair. So something like this,

+-----------------+------------------+-------------------+
|      business_id| Ambience_touristy|  Ambience_romantic|
+-----------------+------------------+-------------------+
|  6iYb2HFDywm3zjw|             False|              False|
|  drRPZA0oiIYSmqs|              Null|              False|
+-----------------+------------------+-------------------+

How do I go about doing this? I am new to this and would appreciate any help.

reign
  • 47
  • 7
  • 2
    could you share the dictionary that constructs the dataframe you have here? –  Feb 05 '22 at 06:12

1 Answers1

0

re-using a function from another SO answer called unnesting() with modifications. edit: please note that the string in row 2 is not a valid Python dictionary. this solution assumes valid Python dictionaries.

def unnesting(df, explode):
    """ modified from here: https://stackoverflow.com/a/53218939/42346 """
    df1 = pd.concat( 
        [ 
            pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x + "_") 
            if x == "Ambience" 
            else pd.DataFrame(df[x], index=df.index) 
            for x in explode 
        ], 
        axis=1, 
    ) 
    return df1.join(df.drop(explode, 1), how="left")      

you can then do:

import ast

df["Ambience"] = df["Ambience"].apply(ast.literal_eval) 
df_unnested = unnesting(df, ["business_id", "Ambience"])

result:

       business_id Ambience_touristy Ambience_hipster  Ambience_romantic  Ambience_divey  Ambience_intimate Ambience_classy Ambience_casual
1  6iYb2HFDywm3zjw             False            False              False           False              False             NaN             NaN
2  drRPZA0oiIYSmqs               NaN              NaN              False           False              False           False            True
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
  • thank you for replying. I am however receiving an 'malformed node or string: None' error – reign Feb 05 '22 at 06:51
  • @reign: that occurs because the example dictionary in the 2nd row is not a valid Python dictionary. be sure all of your dictionaries are valid Python dictionaries. – mechanical_meat Feb 05 '22 at 06:51