0

I'm trying to convert a dataframe like this:

df = pd.DataFrame({'A': ['1', '1', '2', '3'],
                   'B': ['1.1', '1.2', '2.1', '3.1'],
                   'C': ['1.1.1', '1.2.1', '2.1.2', '3.1.5'],
                   'Value': [25, 12, 15, 19],
                   'Period': ['2015', '2018', '2010', '2019']})  

Or

A   B   C   Value   Period
1   1.1 1.1.1   25  2015
1   1.2 1.2.1   12  2018
2   2.1 2.1.2   15  2010
3   3.1 3.1.5   19  2019

to a nested json output like:

{'A': [{'aCode': '1',
        'B': [{'bCode': '1.1',
           'C': [{'cCode': '1.1.1',
              'figures': [{'Value': 25, 'Period': '2015'}]}]}]},
        'B': [{'bCode': '1.2',
           'C': [{'cCode': '1.2.1',
              'figures': [{'Value': 12, 'Period': '2018'}]}]}]},
       {'aCode': '2',
        'B': [{'bCode': '2.1',
          'C': [{'cCode': '2.1.2',
            'figures': [{'Value': 15, 'Period': '2010'}]}]}]},
       {'aCode': '3',
        'B': [{'bCode': '3.1',
          'C': [{'cCode': '3.1.5',
            'figures': [{'Value': 19, 'Period': '2019'}]}]}]}]}

There are some vaguely relevant resources online but they are quite involved. I wonder if there's a clean and clever way to achieve this. Thanks in advance!

  • Does this answer your question? [Inverse of Pandas json\_normalize](https://stackoverflow.com/questions/54776916/inverse-of-pandas-json-normalize) – Kache Aug 15 '22 at 22:20
  • check `doc` -https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html – Divyank Aug 16 '22 at 08:05
  • @Kache, I'm afraid it doesn't. These functions give me a flat JSON, I need nested. – HabibKhan Aug 16 '22 at 16:16
  • You'll probably want to fix your nested json output example, that's not a valid nesting. – Kache Aug 16 '22 at 17:12
  • @Kache, I'm sorry about that. I manually typed it in since that's what I'm trying to get. Also, I'm pretty ignorant in json. – HabibKhan Aug 16 '22 at 18:09

1 Answers1

0

From your question, it sounds like you're having difficulty with the nesting part. This should get you most of the way there:

from collections import defaultdict

df = pd.DataFrame({'A': ['1', '1', '2', '3'],
                   'B': ['1.1', '1.2', '2.1', '3.1'],
                   'C': ['1.1.1', '1.2.1', '2.1.2', '3.1.5'],
                   'Value': [25, 12, 15, 19],
                   'Period': ['2015', '2018', '2010', '2019']})

tree = lambda: defaultdict(tree)

output = tree()
for row in df.itertuples():
    leaf = output[row.A][row.B][row.C]
    figures = leaf.get('figures', [])
    leaf['figures'] = [*figures, {'Value': row.Value, 'Period': row.Period}]

Which will produce:

defaultdict(<function <lambda> at 0x12278ac20>, {
  '1': defaultdict(<function <lambda> at 0x12278ac20>, {
    '1.1': defaultdict(<function <lambda> at 0x12278ac20>, {
      '1.1.1': defaultdict(<function <lambda> at 0x12278ac20>, {
        'figures': [{'Period': '2015', 'Value': 25}],
      }),
    }),
    '1.2': defaultdict(<function <lambda> at 0x12278ac20>, {
      '1.2.1': defaultdict(<function <lambda> at 0x12278ac20>, {
        'figures': [{'Period': '2018', 'Value': 12}],
      }),
    }),
  }),
  '2': defaultdict(<function <lambda> at 0x12278ac20>, {
    '2.1': defaultdict(<function <lambda> at 0x12278ac20>, {
      '2.1.2': defaultdict(<function <lambda> at 0x12278ac20>, {
        'figures': [{'Period': '2010', 'Value': 15}],
      }),
    }),
  }),
  '3': defaultdict(<function <lambda> at 0x12278ac20>, {
    '3.1': defaultdict(<function <lambda> at 0x12278ac20>, {
      '3.1.5': defaultdict(<function <lambda> at 0x12278ac20>, {
        'figures': [{'Period': '2019', 'Value': 19}],
      }),
    }),
  }),
})
Kache
  • 15,647
  • 12
  • 51
  • 79
  • many thanks! This gets me pretty close. However, I need the key-value pair in the output reversed. E.g. '1.1', '1.1.1' are codes for keys in my data. I will also need control over naming the keys. Am I missing something? – HabibKhan Aug 16 '22 at 18:40