0

I have a pandas dataframe, which looks like this:

enter image description here

I want to create a dictionary from this frame which looks like this:

{s1: [abc, xyz, qwe], s2:[xyz, ...], ...}

The frame has around 3M rows and 400k unique 'Supplier'. What is the best way to optimize this process?

ROHIT SHARMA
  • 15
  • 1
  • 3
  • 1
    `df.groupby('Supplier')['Item'].agg(list).to_dict()`? [GroupBy results to dictionary of lists](https://stackoverflow.com/q/29876184/15497888) – Henry Ecker Sep 15 '21 at 19:32
  • 1
    That is a very large dict, which is unlikely to be efficient (and certainly less efficient than using the dataframe directly). – 9769953 Sep 15 '21 at 19:32
  • 1
    I don't know that "certainly less efficient than using the dataframe directly" is correct as lookups by Supplier in the dictionary would be constant time, when they are linear time in the DataFrame assuming the Supplier is a column. – Henry Ecker Sep 15 '21 at 19:35
  • 1
    if Supplier is unique, then makng Supplier an index would achieve that o(1) lookup; unless the user has some other reason to use a dictionary – sammywemmy Sep 15 '21 at 21:10

0 Answers0