2

I've data stored in pandas dataframe and I want to convert tat into a JSON format. Example data can be replicated using following code

data = {'Product':['A', 'B', 'A'],
        'Zone':['E/A', 'A/N', 'E/A'],
        'start':['08:00:00', '09:00:00', '12:00:00'],
        'end':['12:30:00', '17:00:00', '17:40:00'],
        'seq':['0, 1, 2 ,3 ,4','0, 1, 2 ,3 ,4', '0, 1, 2 ,3 ,4'],
        'store':['Z',"'AS', 'S'", 'Z']
        }

df = pd.DataFrame(data)

I've tried converting it into JSON format using following code

df_parsed = json.loads(df.to_json(orient="records"))

Output generated from above

[{'Product': 'A', 'Zone': 'E/A', 'start': '08:00:00', 'end': '17:40:00', 'seq': '0, 1, 2 ,3 ,4', 'store': 'Z'}, {'Product': 'B', 'Zone': 'A/N', 'start': '09:00:00', 'end': '17:00:00', 'seq': '0, 1, 2 ,3 ,4', 'store': 'AS'}, {'Product': 'A', 'Zone': 'E/A', 'start': '08:00:00', 'end': '17:40:00', 'seq': '0, 1, 2 ,3 ,4', 'store': 'Z'}]

Desired Result:

{
'A': {'Zone': 'E/A', 
'tp': [{'start': [8, 0], 'end': [12, 0], 'seq': [0, 1, 2 ,3 ,4]},
      {'start': [12, 30], 'end': [17, 40], 'seq': [0, 1, 2 ,3 ,4]}],
      
'store': ['Z']
}, 
'B': {'Zone': 'A/N', 
'tp': [{'start': [9, 0], 'end': [17, 0], 'seq': [0, 1, 2 ,3 ,4]}],
      
'store': ['AS', 'S']
}
}

If a product belongs to same store the result for column start, end and seq should be clubbed as shown in desired output. Also start time and end time should be represented like [9,0] if value for time is "09:00:00" only hour and minute needs to be represented so we can discard value of seconds from time columns.

Lopez
  • 461
  • 5
  • 19
  • 3
    You need to transform your DataFrame before making the conversion to JSON format. For example you can try aggregating your data using ```df.groupby('Product')``` or another pandas method. – Sura-da Dec 18 '20 at 19:33
  • @Sura-da if the answer performs some manupilation with data frame then its fine. and `groupby` should be used to club data – Lopez Dec 19 '20 at 05:19
  • If my answer helps then marks it as an accepted answer by clicking on the tick. – Pygirl Dec 19 '20 at 15:05

1 Answers1

2

This will be complicated a bit. So you have to do it step by step:

def funct(row):
    row['start'] = row['start'].str.split(':').str[0:2]
    row['end'] = row['end'].str.split(':').str[0:2]
    row['store'] = row['store'].str.replace("'", "").str.split(', ')

    d = (row.groupby('Zone')[row.columns[1:]]
        .apply(lambda x: x.to_dict(orient='record'))
        .reset_index(name='tp').to_dict(orient='row'))
    return d

di = df.groupby(['Product'])[df.columns[1:]].apply(funct).to_dict()

di:

{'A': [{'Zone': 'E/A',
   'tp': [{'start': ['08', '00'],
     'end': ['12', '30'],
     'seq': '0, 1, 2 ,3 ,4',
     'store': ['Z']},
    {'start': ['12', '00'],
     'end': ['17', '40'],
     'seq': '0, 1, 2 ,3 ,4',
     'store': ['Z']}]}],
 'B': [{'Zone': 'A/N',
   'tp': [{'start': ['09', '00'],
     'end': ['17', '00'],
     'seq': '0, 1, 2 ,3 ,4',
     'store': ['AS', 'S']}]}]}

Explanation:

  • 1st create your own custom function.
  • change the start, end column to a list form.
  • group by Zone and apply to_dict to rest of the columns.
  • reset index and name the column that are having [{'start': ['08', '00'], 'end': ['12', '30'], 'seq': '0, 1, 2 ,3 ,4', as tp.
  • now apply to_dict to the whole result and return it.

Ultimately you need to convert your dataframe into this below format once you are able to do it the rest of the thing will become easy for you.

Zone    tp
E/A    [{'start': ['08', '00'], 'end': ['12', '30'], ...
A/N    [{'start': ['09', '00'], 'end': ['17', '00'], ... 

EDIT:

import pandas as pd
import ast

def funct(row):
    y = row['start'].str.split(':').str[0:-1]
    row['start'] = row['start'].str.split(':').str[0:2].apply(lambda x: list(map(int, x)))
    row['end'] = row['end'].str.split(':').str[0:2].apply(lambda x: list(map(int, x)))
    row['seq'] = row['seq'].apply(lambda x: list(map(int, ast.literal_eval(x))))
    row['store'] = row['store'].str.replace("'", "")

    d = (row.groupby('Zone')[row.columns[1:-1]]
        .apply(lambda x: x.to_dict(orient='record'))
        .reset_index(name='tp'))
    ######### For store create a different dataframe and then merge it to the other df ########
    d1 = (row.groupby('Zone').agg({'store': pd.Series.unique}))
    d1['store'] = d1['store'].str.split(",")
    d_merged = (pd.merge(d,d1, on='Zone', how='left')).to_dict(orient='record')[0]
    return d_merged

di = df.groupby(['Product'])[df.columns[1:]].apply(funct).to_dict()

di:

{'A': {'Zone': 'E/A',
  'tp': [{'start': [8, 0], 'end': [12, 30], 'seq': [0, 1, 2, 3, 4]},
   {'start': [12, 0], 'end': [17, 40], 'seq': [0, 1, 2, 3, 4]}],
  'store': ['Z']},
 'B': {'Zone': 'A/N',
  'tp': [{'start': [9, 0], 'end': [17, 0], 'seq': [0, 1, 2, 3, 4]}],
  'store': ['AS', ' S']}}
 
Pygirl
  • 12,969
  • 5
  • 30
  • 43
  • your answer is almost correct except a few things that are part of desired result but can't be achieved through your solution.like `start` ,`end` and `seq` where not in form of string you can refer to my question to see the actual output that is required. start/end should be represented as `[8, 0]` also sequence should be represented as `[1, 2, 3, 4, 5]`. You've clubbed `store` with `tp` you can see my desired result it is not part of `tp` Can you a provide fix? – Lopez Dec 19 '20 at 18:55