0

I'm looking to export my DF to Json in a specific format, however I can't get the group-by right and I'm at a loss where to include it.

Dataframe:

  item_type  purch_price sale_price         city    location
0    Iphone         1200       1150          NaN         NaN
1  Computer          700        NaN  Los Angeles  1st street
2  Computer          700        NaN     San Jose  2nd street

Current code:

import json
import pandas as pd

df = pd.read_csv(r'filepath', delimiter=';', header=0)

df = df.fillna('')

def shop_details(row):
    if row['city'] != '' and row['location'] !='':
        return [{'city': row['city'], 'location': row['location']}]
    else:
        return []

df['shop_details'] = df.apply(lambda row: shop_details(row), axis = 1)

df = df.drop(['city', 'location'], axis = 1)

def print_json(text):
    parsed = json.loads(text)
    print(json.dumps(parsed, indent=4, sort_keys=False))

print_json(df.to_json(orient='records'))

* Current Result *

[
    {
        "item_type": "Iphone",
        "purch_price": 1200,
        "sale_price": 1150.0,
        "shop_details": []
    },
    {
        "item_type": "Computer",
        "purch_price": 700,
        "sale_price": "",
        "shop_details": [
            {
                "city": "Los Angeles",
                "location": "1st Street"
            }
        ]
    },
    {
        "item_type": "Computer",
        "purch_price": 700,
        "sale_price": "",
        "shop_details": [
            {
                "city": "San Jose",
                "location": "2nd Street"
            }
        ]
    }
]

Desired output:

[{
    "item_type": "Iphone",
    "purch_price": "1200",
    "sale_price": "1150",
    "shop_details": []
  },
  {
    "item_type": "Computer",
    "purch_price": "700",
    "sale_price": "600",
    "shop_details": [{
        "city": "Los Angeles",
        "location": "1st street"
      },
      {
        "city": "San Jose",
        "location": "2nd street"
      }
    ]
  }
]
  • As seen in the example, I'd like the city and location information to be showed in the 'desired_output' format, rather than having two entries with the full item_type, purchase_price and sale_price.
Deus14
  • 3
  • 2
  • Alternatively I have this code that groups correctly, but of which the output is not correct: g_df = (df. groupby(['item_type', 'purch_price', 'sale_price'])["city", "location"]. apply(lambda x: x.to_dict(orient='records')). reset_index(name='shop_details')) – Deus14 Sep 15 '19 at 16:02
  • Where is `"sale_price": "600"` coming from in your desired output? I don't see that in the dataframe. – chitown88 Sep 16 '19 at 11:12

1 Answers1

0

After you drop the columns, use groupby with 'sum' to concatenate your list of dicts (another simpler example for concatenating lists: link) and then reset the index:

df_grouped=df.groupby(by=['item_type','purch_price','sale_price']).agg({'shop_details':'sum'}).reset_index()

The resulting output of your print_json is then:

[
{
    "item_type": "Computer",
    "purch_price": 700,
    "sale_price": "",
    "shop_details": [
        {
            "city": "'Los Angeles'",
            "location": "'1st street'"
        },
        {
            "city": "'San Jose'",
            "location": "'2nd street'"
        }
    ]
},
{
    "item_type": "Iphone",
    "purch_price": 1200,
    "sale_price": 1150.0,
    "shop_details": []
}
]

Note that I'm not sure where your sale price of 600 comes from as it's not in your original dataframe.

chris
  • 1,267
  • 7
  • 20