1

I need help in converting dataframe into dictionary like below where id is the main key and value of the inner dictionary should be greater than 0:

Given dataframe:

id  score1  score2 score3  score4     score5
1  0.0000  0.1087  0.0000  0.0786       1
2  0.0532  0.3083  0.2864  0.4464       1
3  0.0000  0.0840  0.8090  0.2331       1

Expected solution:

[1:{'score2': 0.10865899999999999,
  'score4': 0.078597,
  'score5': 1.0},
 2:{'score1': 0.053238000000000001,
  'score2': 0.308253,
  'score3': 0.28635300000000002,
  'score4': 0.44643299999999997,
  'score5': 1.0},
 3:{'score2': 0.083978999999999998,
  'score3': 0.80898300000000001,
  'score4': 0.23305200000000001,
  'score5': 1.0}]

My solution: I am using df.to_dict(orient='records') giving the below solution:

[{'id': 1.0,
  'score1': 0.0,
  'score2': 0.10865899999999999,
  'score3': 0.0,
  'score4': 0.078597,
  'score5': 1.0},
 {'id': 2.0,
  'score1': 0.053238000000000001,
  'score2': 0.308253,
  'score3': 0.28635300000000002,
  'score4': 0.44643299999999997,
  'score5': 1.0},
 {'id': 3.0,
  'score1': 0.0,
  'score2': 0.083978999999999998,
  'score3': 0.80898300000000001,
  'score4': 0.23305200000000001,
  'score5': 1.0}]
cs95
  • 379,657
  • 97
  • 704
  • 746
user15051990
  • 1,835
  • 2
  • 28
  • 42

1 Answers1

1

I assume your expected output is a dict of dicts, you can use

df.set_index('id').agg(lambda x: x[x != 0].to_dict(), axis=1).to_dict()

{1: {'score2': 0.1087, 'score4': 0.0786, 'score5': 1.0},
 2: {'score1': 0.0532,
  'score2': 0.3083,
  'score3': 0.2864,
  'score4': 0.4464,
  'score5': 1.0},
 3: {'score2': 0.084, 'score3': 0.809, 'score4': 0.2331, 'score5': 1.0}}

Details

Set ID as the index, so it becomes the key in the output dict:

df.set_index('id')

    score1  score2  score3  score4  score5
id                                        
1   0.0000  0.1087  0.0000  0.0786       1
2   0.0532  0.3083  0.2864  0.4464       1
3   0.0000  0.0840  0.8090  0.2331       1

Next, convert each row to a dictionary, dropping columns with values equalling 0:

_.agg(lambda x: x[x != 0].to_dict(), axis=1)

id
1    {'score2': 0.1087, 'score4': 0.0786, 'score5':...
2    {'score1': 0.0532, 'score2': 0.3083, 'score3':...
3    {'score2': 0.084, 'score3': 0.809, 'score4': 0...
dtype: object

The final step is to convert this to a dict of dicts:

_.to_dict()

{1: {'score2': 0.1087, 'score4': 0.0786, 'score5': 1.0},
 2: {'score1': 0.0532,
  'score2': 0.3083,
  'score3': 0.2864,
  'score4': 0.4464,
  'score5': 1.0},
 3: {'score2': 0.084, 'score3': 0.809, 'score4': 0.2331, 'score5': 1.0}}
cs95
  • 379,657
  • 97
  • 704
  • 746
  • This is great! And is it possible to filter on id value as well. For ex. if I want to show where ids are in [1,2]. (Not 3)? How to do that? – user15051990 Feb 09 '20 at 05:32
  • @user15051990 You can start with `df = df[df['id'].isin([1, 2])]` and then use the code in my answer as usual. See [this post](https://stackoverflow.com/q/19960077/4909087) for more info. – cs95 Feb 09 '20 at 05:35