0

I have a list of dictionaries where keys are identical but values in each dictionary is not same, and the order of each dictionary strictly preserved. I am trying to find an automatic solution to populate these dictionaries to pandas dataframe as new column, but didn't get the expected output.

original data on gist

here is the data that I have on old data on gist.

my attempt

here is my attempt to populate multiple dictionaries with same keys but different values (binary value), my goal is I want to write down handy function to vectorize the code. Here is my inefficient code but works on gist

import pandas as pd

dat= pd.read_csv('old_data.csv', encoding='utf-8')

dat['type']=dat['code'].astype(str).map(typ)
dat['anim']=dat['code'].astype(str).map(anim)
dat['bovin'] = dat['code'].astype(str).map(bov)
dat['catg'] = dat['code'].astype(str).map(cat)
dat['foot'] = dat['code'].astype(str).map(foo)

my code works but it is not vectorized (not efficient I think). I am wondering how can I make this few lines of a simple function. Any idea? how to we make this happen as efficiently as possible?

Here is my current and the desired output:

since I got correct output but code is not well efficient here. this is my current output on gist

beyond_inifinity
  • 443
  • 13
  • 29
  • 1
    You cannot post your full data and full code and expect people to go through it for you and debug it. When asking your question, you have supply a small example which represents your actual problem. This way you also force yourself to understand your problem fully. Have a look at one of my [questions](https://stackoverflow.com/questions/57774352/fill-in-same-amount-of-characters-where-other-column-is-nan) – Erfan Feb 03 '20 at 20:48
  • @Erfan there is no bug in my code, just want to simplify the process of populating multople dictionaries to pandas. if I posted my current code to `SO` it is gonna be a long post that might be a burden to `SO` community. How can I simplify my current attempt? any idea? – beyond_inifinity Feb 03 '20 at 20:52
  • @Erfan is correct, there is no way to reproduce your code if you use a read_excel. Reduce the example to something that can be copy/pasted to an editor and works immediately from there, then people can reproduce it and start working from there. – divingTobi Feb 03 '20 at 21:10
  • All the links are broken, please provide a [mcve]. – AMC Feb 04 '20 at 05:18

1 Answers1

1

If you restructure your dictionaries into a dictionary of dictionaries you can one line it:

 for keys in values.keys():
        dat[keys]=dat['code'].astype(str).map(values[keys])

Full code:

values = {"typ" :{
    '20230' : 'A',
    '20130' : 'A',
    '20220' : 'A',
    '20120' : 'A',
    '20329' : 'A',
    '20322' : 'A',
    '20321' : 'B',
    '20110' : 'B',
    '20210' : 'B',
    '20311' : 'B'
    } ,

"anim" :{
    '20230' : 'AOB',
    '20130' : 'AOB',
    '20220' : 'AOB',
    '20120' : 'AOB',
    '20329' : 'AOC',
    '20322' : 'AOC',
    '20321' : 'AOC',
    '20110' : 'AOB',
    '20210' : 'AOB',
    '20311' : 'AOC'
    } ,

"bov" :{
    '20230' : 'AOD',
    '20130' : 'AOD',
    '20220' : 'AOD',
    '20120' : 'AOD',
    '20329' : 'AOE',
    '20322' : 'AOE',
    '20321' : 'AOE',
    '20110' : 'AOD',
    '20210' : 'AOD',
    '20311' : 'AOE'
    } ,

"cat" :{
    '20230' : 'AOF',
    '20130' : 'AOG',
    '20220' : 'AOF',
    '20120' : 'AOG',
    '20329' : 'AOF',
    '20322' : 'AOF',
    '20321' : 'AOF',
    '20110' : 'AOG',
    '20210' : 'AOF',
    '20311' : 'AOG'
    } ,

"foo" :{
    '20230' : 'AOL',
    '20130' : 'AOL',
    '20220' : 'AOM',
    '20120' : 'AOM',
    '20329' : 'AOL',
    '20322' : 'AOM',
    '20321' : 'AOM',
    '20110' : 'AOM',
    '20210' : 'AOM',
    '20311' : 'AOM'
    } 
}




import pandas as pd

dat= pd.read_csv('old_data.csv', encoding='utf-8')
for keys in values.keys():
    dat[keys]=dat['code'].astype(str).map(values[keys])
sdhaus
  • 1,866
  • 13
  • 20
  • Can’t you specify the data type using a dict, or even just `dtype=str`, in `pandas.read_csv()`? – AMC Feb 04 '20 at 05:19