0

I am trying to process publicly listed companies financials. I have downloaded the data now I am trying to convert it to JSON format.

There are subsections in the table and used 4 ~'s to denote 1 indentation and 8 for 2 indentations as below:

  • one indentation represents 1 level down
  • double indentation represents 2 levels down

For example Cost of Goods Sold (COGS) incl. D&A is the section header and COGS Growth should be captured as a child element of Cost of Goods Sold (COGS) incl. D&A.

Can you please help me in identifying a way as to how to convert this dataframe to a JSON file?

Table representing Data Frame

|                                       Item  Item|      2016|     2017 |    2018 |    2019   |     2020 |  5-year trend|
|                                     :---------: |    :----:|   :----: |  :----: |  :----:   |   :----: |:------------:|
| Sales/Revenue                                   |-         |-         |-        | -         |615.82K   | NaN          |
| ~~~~Sales Growth                                |-         |-         |-        | -         |-         | NaN          |
| Cost of Goods Sold (COGS) incl. D&A             |684       |5.44K     |3.14K    | 32.5K     |-         | NaN          |
| ~~~~COGS Growth                                 |-         |694.59%   |-42.19%  | 934.31%   |-         | NaN          |
| ~~~~COGS excluding D&A                          |-         |-         |-        | -         |-         | NaN          |
| ~~~~Depreciation & Amortization Expense         |684       |5.44K     |3.14K    | 32.5K     |41.83K    | NaN          |
| ~~~~~~~~Depreciation                            |684       |5.44K     |3.14K    | 32.5K     |41.83K    | NaN          |
| ~~~~~~~~Amortization of Intangibles             |-         |-         |-        | -         |-         | NaN          |
| Gross Income                                    |(684)     |(5.44K)   |(3.14K)  | (32.5K)   |-         | NaN          |
| ~~~~Gross Income Growth                         |-         |-694.59%  |42.19%   | -934.31%  |-         | NaN          |
| ~~~~Gross Profit Margin                         |-         |-         |-        | -         |-         | NaN          |
| SG&A Expense                                    |1.91M     |4.79M     |5.88M    | 9.5M      |9.63M     | NaN          |
| ~~~~SGA Growth                                  |-         |151.12%   |22.61%   | 61.51%    |1.37%     | NaN          |
| ~~~~Research & Development                      |-         |-         |-        | -         |-         | NaN          |
| ~~~~Other SG&A                                  |1.91M     |4.79M     |5.88M    | 9.5M      |9.63M     | NaN          |
| ~~~~Other Operating Expense                     |-         |-         |-        | -         |-         | NaN          |
| Unusual Expense                                 |-         |-         |-        | -         |-         | NaN          |
| EBIT after Unusual Expense                      |-         |-         |-        | -         |-         | NaN          |
| Non Operating Income/Expense                    |-         |-         |(52.76K) | 60.09K    |(2.2K)    | NaN          |
| Non-Operating Interest Income                   |8.9K      |170.93K   |59.8K    | 50.79K    |19.15K    | NaN          |
| Equity in Affiliates (Pretax)                   |-         |-         |-        | -         |-         | NaN          |
| Interest Expense                                |-         |-         |-        | -         |115.55K   | NaN          |
| ~~~~Interest Expense Growth                     |-         |-         |-        | -         |-         | NaN          |
| ~~~~Gross Interest Expense                      |-         |-         |-        | -         |115.55K   | NaN          |
| ~~~~Interest Capitalized                        |-         |-         |-        | -         |-         | NaN          |

Table Organized in subsections

Item Item Subsection1 Subsection2 2016 2017 2018 2019 2020 5-year trend
Sales/Revenue - - - - 615.82K NaN
Sales Growth - - - - - NaN
Cost of Goods Sold (COGS) incl. D&A 684 5.44K 3.14K 32.5K - NaN
COGS Growth - 694.59% -42.19% 934.31% - NaN
COGS excluding D&A - - - - - NaN
Depreciation & Amortization Expense 684 5.44K 3.14K 32.5K 41.83K NaN
Depreciation 684 5.44K 3.14K 32.5K 41.83K NaN
Amortization of Intangibles - - - - - NaN
Gross Income (684) (5.44K) (3.14K) (32.5K) - NaN
Gross Income Growth - -694.59% 42.19% -934.31% - NaN
Gross Profit Mar - - - - - NaN
SG&A Expense 1.91M 4.79M 5.88M 9.5M 9.63M NaN
SGA Growth - 151.12% 22.61% 61.51% 1.37% NaN
Research & Development - - - - - NaN
Other SG&A 1.91M 4.79M 5.88M 9.5M 9.63M NaN
Other Operating Expense - - - - - NaN
Unusual Expense - - - - - NaN
EBIT after Unusual Expense - - - - - NaN
Non Operating Income/Expense - - (52.76K) 60.09K (2.2K) NaN
Non-Operating Interest Income 8.9K 170.93K 59.8K 50.79K 19.15K NaN
Equity in Affiliates (Pretax) - - - - - NaN
Interest Expense - - - - 115.55K NaN
Interest Expense Growth - - - - - NaN
Gross Interest Expense - - - - 115.55K NaN
Interest Capitalized - - - - - NaN
etch_45
  • 792
  • 1
  • 6
  • 21
Waseem Ahmed
  • 57
  • 10

1 Answers1

1

I am able to resolve this by adding values to the missing cells and then groupby on 3 columns the code looks like below. Here is the reference I used to build this code

d = (dframe.fillna("-").groupby(['Item  Item','ItemSubsection1','ItemSubsection2'])['2016','2017','2018','2019','2020']
       .apply(lambda x: x.to_dict('r'))
       .reset_index(name='data')
       .groupby(['Item  Item','ItemSubsection1'])['ItemSubsection2','data']
       .apply(lambda x: x.to_dict('r'))
       .reset_index(name='data')
       .groupby('Item  Item')['ItemSubsection1','data']
       .apply(lambda x: x.set_index('ItemSubsection1', 'ItemSubsection2')['data'].to_dict())
       .to_json()
       )
Waseem Ahmed
  • 57
  • 10