4

I am using

pd.read_sql_query() 

to get data from database and then use

to_json(orient='records') 

this is the dataframe:

(1)
  price_formula_id  premium  product_id  exchange  product_name  product_code   weight  
0            30064      0.0        c001       CME          2018            CL      0.3
1            30064      0.0        c002       CME          2018            CL      0.7

(2)
price_formula_id  premium  product_id  exchange  product_name  product_code   weight  
0            30064      NONE        c001       CME          2018            CL      0.3
1            30064      NONE        c002       CME          2018            CL      0.7

to convert to this formation.

[{
    "price_formula_id": "30064",
    "premium": "0.0",
    "product_id": "c001",
    "exchange": "CME",
    "product_name": "2018",
    "product_code": "CL",
    "weight": "0.3"
},
{
    "price_formula_id": "30064",
    "premium": "0.0",
    "product_id": "c002",
    "exchange": "CME",
    "product_name": "2018",
    "product_code": "CL",
    "weight": "0.7"
}]

but what I really want should be like this :

 { 
   "price_formula_id": "30064",
   "premium": "0.0",
   "basket": 
    [
     {"product_id": "c001",
      "exchange": "CME",
      "product_name": "2018",
      "product_code": "CL",
      "weight": "0.3"
     },
     {
      "product_id": "c002",
      "exchange": "CME",
      "product_name": "2018",
      "product_code": "CL",
      "weight": "0.7"
     }
    ]
 }

I need to group the same info and set a new index 'basket' for the rest. how could I make it? Thanks very much.

foglerit
  • 7,792
  • 8
  • 44
  • 64
Alex
  • 239
  • 2
  • 9

1 Answers1

6

Use groupby with custom function with to_dict for all columns filtered by difference, reset_index and last convert it to_json:

cols = df.columns.difference(['price_formula_id','premium'])
j = (df.groupby(['price_formula_id','premium'])[cols]
       .apply(lambda x: x.to_dict('r'))
       .reset_index(name='basket')
       .to_json(orient='records'))
print (j)

[{
    "price_formula_id": 30064,
    "premium": 0.0,
    "basket": [{
            "exchange": "CME",
            "product_code": "CL",
            "product_id": "c001",
            "product_name": 2018,
            "weight": 0.3
        },
        {
            "exchange": "CME",
            "product_code": "CL",
            "product_id": "c002",
            "product_name": 2018,
            "weight": 0.7
        }
    ]
}]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • hi, thx for ur comment. but I got a error for this method said: reset_index() got an unexpected keyword argument 'name'. I check the declaration of reset_index() and this no params for it, how should I change it? thx – Alex Jul 26 '18 at 06:46
  • @Alex - You can try `.reset_index(name='basket')` change to `.reset_index().rename(columns={'index':'basket'})`, change value for rename if necessary – jezrael Jul 26 '18 at 06:48
  • one more thing need to ask what's function of ' r ' ? thx @jezrael – Alex Jul 26 '18 at 07:09
  • It is `records` like `orient='records'` – jezrael Jul 26 '18 at 07:09
  • it works, thx. but when i got set a column as null(like 'premium': ' '), it will return [ ], I cant figure out why – Alex Jul 26 '18 at 07:24
  • @Alex - Can you change sample data? – jezrael Jul 26 '18 at 07:25
  • yea, plz go check the dataframe sample (2) in the top – Alex Jul 26 '18 at 07:33
  • @Alex - Maybe help first step convert missing values to `null` string like `df = df.fillna('null')` – jezrael Jul 26 '18 at 07:42