1

I want to create a dictionary that will have multiple values for a key.

I have the following DataFrame df1 like this:

    ID       SUM       COUNT
2   3002    256401.0    15
1   3021    5412.49     21
4   3022    45454.12    12
3   3025    15487.23    45
5   3026    154876.1    88

And I would like to map the ID's to the big lookup table to get multiple values for each ID. My other look up table df2 looks like this,

    ID      Var1    Var2    Var3    Var4
0   3021    Cfd45   shkjs   shkjs   465
1   3021    sfs     None    fjslk   323
2   3021    hjfks   shka    fsad    457
3   3022    None    skank   vssf    236
4   3022    sdkj    vbjajl  bbf     124
5   3025    klgf    vcbj    jgjg    895
6   3026    tuo     None    gdsfs   457
7   3026    klo     ahk     None    569
8   3026    nbjd    alkdjl  sfsfs   124

Now, I want to map these two DataFrames by ID and create a dictionary that has unique ID and multiple values from df2's var1, var2 and var3 columns,

{3021: ['Cfd45', 'shkjs', 'sfs', 'jkdg', 'fjslk', 'hjfks', 'shka', 'fsad'],
3022: ['skank', 'vssf', 'sdkj', vbjajl', 'bbf'],
3025: ...}

The values could be in a list or tuple.

For my scenario, the closest I could find was this post. However, I want values from multiple columns when I map to the table. How can I proceed with this? Any ideas would be appreciated.

i.n.n.m
  • 2,936
  • 7
  • 27
  • 51

1 Answers1

1

Let try this on df2:

df2.set_index('ID')[['Var1','Var2','Var3']].stack().groupby(level=0).apply(lambda x: x.unique()).to_dict()

Output:

    {3021: array(['Cfd45', 'shkjs', 'sfs', 'None', 'fjslk', 'hjfks', 'shka', 'fsad'], dtype=object),
 3022: array(['None', 'skank', 'vssf', 'sdkj', 'vbjajl', 'bbf'], dtype=object),
 3025: array(['klgf', 'vcbj', 'jgjg'], dtype=object),
 3026: array(['tuo', 'None', 'gdsfs', 'klo', 'ahk', 'nbjd', 'alkdjl', 'sfsfs'], dtype=object)}

OR

df2.set_index('ID')[['Var1','Var2','Var3']].stack().groupby(level=0).apply(lambda x: x.unique()).to_json()

Output:

'{"3021":["Cfd45","shkjs","sfs","None","fjslk","hjfks","shka","fsad"],"3022":["None","skank","vssf","sdkj","vbjajl","bbf"],"3025":["klgf","vcbj","jgjg"],"3026":["tuo","None","gdsfs","klo","ahk","nbjd","alkdjl","sfsfs"]}'
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • thank you! This is a good approach. However, I just need I only need `var1`, `var2` and `var3`. Also, I assume this approach will take lot of time when i have many records. – i.n.n.m Oct 07 '17 at 19:52
  • @i.n.n.m Okay, added the restriction for Var1, Var2, Var3 – Scott Boston Oct 07 '17 at 19:59
  • nice, had some other columns with about over a million record and it took long time. Now that I limit to columns, it works. I was trying this, here, https://stackoverflow.com/questions/44713968/pandas-dataframe-to-dict-on-multiple-columns-and-values-to-list/44713994#comment80198211_44713994. I guess, the difference is using `stack()` in your code to add multiple columns? – i.n.n.m Oct 07 '17 at 20:08
  • 1
    Yes, I am using stack() to align all thos Var's in to a single column the using the same technique as described in that SO post except I am getting unique values. – Scott Boston Oct 07 '17 at 20:11