0

I have the following list of dictionaries with same KEY, I want to convert it to a dataframe where the same Key will be grouped as column and the likewise the values.

a =[{'0WVj9KjC13RO1aj0KqPpHQ': {'Height': 3360.0}},{'0WVj9KjC13RO1aj0KqPpHQ': {'Length': 3842.12759802427}},{'0WVj9KjC13RO1aj0KqPpHQ': {'Width': 250.0}},{'0WVj9KjC13RO1aj0KqPpHQ':{'GrossFootprintArea': 0.960531899506066}},{'0WVj9KjC13RO1aj0KqPpHQ': {'NetVolume': 3227387182.34039}},{'0WVj9KjC13RO1aj0KqPpHQ': {'NetSideArea': 12909548.7293616}},{'0WVj9KjC13RO1aj0KqPpHQ': {'NetSideArea': 29.4201612577353}},{'0WVj9KjC13RO1aj0KqPpHQ':{'NetVolume': 3.22738718234038}},{'0WVj9KjC13RO1al0GqPold': {'Height': 3000.0}},{'0WVj9KjC13RO1al0GqPold': {'Length': 5070.0}},{'0WVj9KjC13RO1al0GqPold': {'Width': 250.0}},{'0WVj9KjC13RO1al0GqPold': {'GrossFootprintArea': 1.2675}},{'0WVj9KjC13RO1al0GqPold': {'NetVolume': 3802500000.0}},{'0WVj9KjC13RO1al0GqPold': {'NetSideArea': 15210000.0}},{'0WVj9KjC13RO1al0GqPold': {'NetSideArea': 30.47539488}},{'0WVj9KjC13RO1al0GqPold': {'NetVolume': 3.08609936}}]

Thus far I have done

df_a = pd.DataFrame(a)

this is what I got but i want the VALUES to be columns of that specific KEY

I want the output to be enter image description here

  • Your dict has 2 keys, what is the logic behind showing only the `0WVj9KjC13RO1al0GqPold` key in the resulting dataframe? And what would be the values for those columns? Can you update your question with the to-be dataframe (not just the columns)? Also, please show us what you've tried so far (remember, SO is not a code-writing service). – DocZerø Mar 23 '22 at 11:14
  • You need to [flatten your dictionaries](https://stackoverflow.com/questions/6027558/flatten-nested-dictionaries-compressing-keys) and then you might be able to use [`pd.DataFrame.from_records()`](https://stackoverflow.com/questions/6027558/flatten-nested-dictionaries-compressing-keys) or [`pd.DataFrame.from_dict()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.from_dict.html) – Mushroomator Mar 23 '22 at 11:15
  • @DocZerø I am new to python and stackoverflow... I am learning. Thanks for the comment – Apollos Bangalu Mar 23 '22 at 11:51

1 Answers1

0

You have duplicate columns, so assuming you will want to number the duplicate columns and assuming the duplicate records appear in same order:

data = defaultdict(dict)
for x in a:
  id_ = list(x.keys())[0]
  for k, v in x[id_].items():
    if k in data[id_]:
      i = 1
      k = f"{k}_{i}"
      while k in data[id_]:
        i += 1
        k = f"{k}_{i}"
    data[id_][k] = v

df = pd.DataFrame.from_dict(data).T

Output:

    
                        Height  Length  Width   GrossFootprintArea  NetVolume   NetSideArea NetSideArea_1   NetVolume_1
0WVj9KjC13RO1aj0KqPpHQ  3360.0  3842.127598 250.0   0.960532    3.227387e+09    1.290955e+07    29.420161   3.227387
0WVj9KjC13RO1al0GqPold  3000.0  5070.000000 250.0   1.267500    3.802500e+09    1.521000e+07    30.475395   3.086099
mujjiga
  • 16,186
  • 2
  • 33
  • 51
  • Dear @mujjiga, Thank you for the suggestion. I tried using your suggestion but it gives an error. Maybe I do not understand (since I am a newbie). Can you explain more particularly the "defaultdict(dict)". The duplicate column name solution is a good observation. Thanks. – Apollos Bangalu Mar 23 '22 at 13:55
  • import `from collections import defaultdict` – mujjiga Mar 23 '22 at 18:42