Pandas dictionary creation optimization

Asked Sep 15 '21 at 19:31

Active Sep 15 '21 at 19:31

Viewed 38 times

I have a pandas dataframe, which looks like this:

I want to create a dictionary from this frame which looks like this:

{s1: [abc, xyz, qwe], s2:[xyz, ...], ...}

The frame has around 3M rows and 400k unique 'Supplier'. What is the best way to optimize this process?

asked Sep 15 '21 at 19:31

ROHIT SHARMA

1

`df.groupby('Supplier')['Item'].agg(list).to_dict()`? [GroupBy results to dictionary of lists](https://stackoverflow.com/q/29876184/15497888) – Henry Ecker Sep 15 '21 at 19:32
1

That is a very large dict, which is unlikely to be efficient (and certainly less efficient than using the dataframe directly). – 9769953 Sep 15 '21 at 19:32
1

I don't know that "certainly less efficient than using the dataframe directly" is correct as lookups by Supplier in the dictionary would be constant time, when they are linear time in the DataFrame assuming the Supplier is a column. – Henry Ecker Sep 15 '21 at 19:35
1

if Supplier is unique, then makng Supplier an index would achieve that o(1) lookup; unless the user has some other reason to use a dictionary – sammywemmy Sep 15 '21 at 21:10

0 Answers0