-1

I am trying to .map a dictionary to a pandas DataFrame. One of the columns in pandas DataFrame is the key in dict. Here's a reproducible example,

import pandas as pd

df = pd.DataFrame({
                   'id': [0, 1, 2],
                   'nm': ['pn1','pn2','pn3],
                   'v': [np.nan, 25, 0],
                   'd': [{'k1':'v1','k2':'v2','k3':'v3'}]
                 })

dtd = { 
       'pn1':{'s':100,'v':20, sv:{['sv1': 500]}},
       'pn2':{'s':150,'v':30, sv:{['sv1': 400]}}
      }

I'd like to to look the key and parse the value sv1 from the nested dictionary and assign it to pandas series df.v.

Series would look like this:

v
500
400
0
       
 
kms
  • 1,810
  • 1
  • 41
  • 92

2 Answers2

1

Both your definitions raise some kind of error [df because all columns need to have the same length and d just had 1 item, and dtd because ['sv1': 500] is not valid syntax]. However, if you had

# import pandas as pd
# import numpy as np

df = pd.DataFrame({ 'id': [0, 1, 2], 'nm': ['pn1','pn2','pn3'], 
                    'v': [np.nan, 25, 0], 
                    'd': [{'k1':'v1'},{'k2':'v2'},{'k3':'v3'}] })
dtd = { 'pn1':{'s':100,'v':20, 'sv':{'sv1': 500}},
        'pn2':{'s':150,'v':30, 'sv':{'sv1': 400}}  }

then you could use something like the function below for extracting nested values [see this question if you're interested in other variations of it]

def getVal(obj, *keys, vDef=None):
    try: 
        for k in keys: obj = obj[k]
    except: obj = vDef
    return obj

If you want to use .map or .apply to create the new column,

sv1 = df['nm'].map(lambda k: getVal(dtd, k, 'sv', 'sv1', vDef=np.nan))
# sv1 = df['nm'].apply(lambda k: getVal(dtd, k, 'sv', 'sv1', vDef=np.nan))

df['v'] = sv1.combine_first(df['v'])

or you can use .combine directly

df['v'] = df['nm'].combine(df['v'], lambda k,vd: getVal(dtd,k,'sv','sv1',vDef=vd))

or you can also use list comprehension (I find it more readable, but it's mostly just a personal preference)

df['v'] = [getVal(dtd,k,'sv','sv1',vDef=vd) for k,vd in zip(df['nm'],df['v'])]

whichever of the 3 methods is used, the resulting df should look like

╒════╤══════╤══════╤═════╤══════════════╕
│    │   id │ nm   │   v │ d            │
╞════╪══════╪══════╪═════╪══════════════╡
│  0 │    0 │ pn1  │ 500 │ {'k1': 'v1'} │
├────┼──────┼──────┼─────┼──────────────┤
│  1 │    1 │ pn2  │ 400 │ {'k2': 'v2'} │
├────┼──────┼──────┼─────┼──────────────┤
│  2 │    2 │ pn3  │   0 │ {'k3': 'v3'} │
╘════╧══════╧══════╧═════╧══════════════╛

[ printed with print(df.to_markdown(tablefmt='fancy_grid')) ]

Driftr95
  • 4,572
  • 2
  • 9
  • 21
0

I made a few changes to your code to make it executable. Try this and see if it works:

import pandas as pd
import numpy as np

df = pd.DataFrame({
   'id': [0, 1, 2],
   'nm': ['pn1','pn2','pn3'],
   'v': [np.nan, 25, 0],
   'd': [{'k1':'v1'},{'k2':'v2'},{'k3':'v3'}]
})

dtd = { 
   'pn1':{'s':100,'v':20, 'sv':{'sv1': 500}},
   'pn2':{'s':150,'v':30, 'sv':{'sv1': 400}}
  }

df['v'] = (
    df['nm']
    .apply(lambda x: dtd[x]['sv']['sv1'] if x in dtd else np.nan)
    .combine_first(df['v'])
)

output:

   id   nm      v             d
0   0  pn1  500.0  {'k1': 'v1'}
1   1  pn2  400.0  {'k2': 'v2'}
2   2  pn3    0.0  {'k3': 'v3'}

I am using the 'nm' column as the key to look up the value in 'dtd'. If there is no value, then I am leaving the existing value in v.

Zach Flanders
  • 1,224
  • 1
  • 7
  • 10