0

I have a pandas dataframe as following:

tree    nodes   classes cues    directions  thresholds  exits
1   1   4   i;i;n;i PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   1;0;1;0.5
2   2   3   i;i;n   PLC2hrOGTT;Age;BMI  >;>;>   126;29;29.7 0;1;0.5
3   3   4   i;i;n;i PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   1;0;0;0.5
4   4   4   i;i;n;i PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   1;1;0;0.5
5   5   4   i;i;n;i PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   0;1;0;0.5
6   6   3   i;i;n   PLC2hrOGTT;Age;BMI  >;>;>   126;29;29.7 0;0;0.5
7   7   4   i;i;n;i PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   1;1;1;0.5
8   8   4   i;i;n;i PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   0;0;0;0.5

and I want to convert it to JSON like this (example for the first row only):

[
    {
            "cues": "PLC2hrOGTT", "directions": ">", "thresholds": "126",
            "parent": "null",
            "children": [
              {
                "cues": "Age", "directions": ">", "thresholds": "29",
                "parent": "PLC2hrOGTT",
                "children": [
                  {
                    "cues": "BMI", "directions": ">", "thresholds": "29.7",
                    "parent": "Age",
                    "children": [
                      {
                        "cues": "TimesPregnant", "directions": ">", "thresholds": "6",
                        "parent": "BMI",
                        "children": [
                          {
                            "cues": "False",
                            "parent": "TimesPregnant",
                          },
                          {
                            "cues": "True",
                            "parent": "TimesPregnant",
                          }
                        ]
                      },
                      {
                        "cues": "True",
                        "parent": "BMI",
                      }
                    ]
                  },
                  {
                    "cues": "False",
                    "parent": "Age"
                  },
                ]
              },
              {
                "cues": "True",
                "parent": "PLC2hrOGTT"
              },
            ]
          }
        ];

and so on for each row.

currrently return tree_definitions.to_json(orient='records') doesn't work. So I wonder is there any way to do this with to_json? or any alternative way, how can I do this?

tree_definitions.to_json(orient='records')` output:

[{"tree":1,"nodes":4,"classes":"i;i;n;i","cues":"PLC2hrOGTT;Age;BMI;TimesPregnant","directions":">;>;>;>","thresholds":"126;29;29.7;6","exits":"1;0;1;0.5"},{"tree":2,"nodes":3,"classes":"i;i;n","cues":"PLC2hrOGTT;Age;BMI","directions":">;>;>","thresholds":"126;29;29.7","exits":"0;1;0.5"},{"tree":3,"nodes":4,"classes":"i;i;n;i","cues":"PLC2hrOGTT;Age;BMI;TimesPregnant","directions":">;>;>;>","thresholds":"126;29;29.7;6","exits":"1;0;0;0.5"},{"tree":4,"nodes":4,"classes":"i;i;n;i","cues":"PLC2hrOGTT;Age;BMI;TimesPregnant","directions":">;>;>;>","thresholds":"126;29;29.7;6","exits":"1;1;0;0.5"},{"tree":5,"nodes":4,"classes":"i;i;n;i","cues":"PLC2hrOGTT;Age;BMI;TimesPregnant","directions":">;>;>;>","thresholds":"126;29;29.7;6","exits":"0;1;0;0.5"},{"tree":6,"nodes":3,"classes":"i;i;n","cues":"PLC2hrOGTT;Age;BMI","directions":">;>;>","thresholds":"126;29;29.7","exits":"0;0;0.5"},{"tree":7,"nodes":4,"classes":"i;i;n;i","cues":"PLC2hrOGTT;Age;BMI;TimesPregnant","directions":">;>;>;>","thresholds":"126;29;29.7;6","exits":"1;1;1;0.5"},{"tree":8,"nodes":4,"classes":"i;i;n;i","cues":"PLC2hrOGTT;Age;BMI;TimesPregnant","directions":">;>;>;>","thresholds":"126;29;29.7;6","exits":"0;0;0;0.5"}]

another view of pandas dataframe I get, consisting of 8 different binary trees This is the dataframe I get, it consists of 8 different binary trees

poweredbygt
  • 71
  • 1
  • 7
  • If you simplify your dataframe and the expected output json, I may be able to help you. – Parijat Bhatt Jul 29 '19 at 21:08
  • @Parijat Bhatt I put a picture of the pandas dataframe, maybe it's better to see the columns. What I need from this pandas dataframe are cues, directions, thresholds, and exits. – poweredbygt Jul 29 '19 at 21:24

1 Answers1

0

You will need to work with your data more. You will need to split up ["cues", "exits", "directions", "thresholds"] into 4 columns each. Then you can use groupby to work with (what I assume will then be) "cues0" and so on. Once you have your groupby the way you want, take a look at this awesome code https://stackoverflow.com/a/50767410/1499803 I'm not sure what this does for missing values (like in the "exits3" and "directions3" columns) so YMMV. Hope this helps.

Back2Basics
  • 7,406
  • 2
  • 32
  • 45