0

I want to convert JSON file into proper format. I have a JSON file as given below:

{
    "fruit": "Apple",
    "size": "Large",
    "color": "Red",
    "details":"|seedless:true|,|condition:New|"

},
{
    "fruit": "Almond",
    "size": "small",
    "color": "brown",
    "details":"|Type:dry|,|seedless:true|,|condition:New|"

}

You can see the data in the details can vary.

I want to change it into :

{
    "fruit": "Apple",
    "size": "Large",
    "color": "Red",
    "seedless":"true",
    "condition":"New",

},
{
    "fruit": "Almond",
    "size": "small",
    "color": "brown",
    "Type":"dry",
    "seedless":"true",
    "condition":"New",

}

I have tried doing it in python using pandas as:

import json
import pandas as pd
import re
df = pd.read_json("data.json",lines=True)

#I tried to change the pattern of data in details column as

re1 = re.compile('r/|(.?):(.?)|/')
re2 = re.compile('r\"(.*?)\":\"(.*?)\"')

df.replace({'details' :re1}, {'details' : re2},inplace = True, regex = True);

But that giving output as "objects" in all the rows of details column.

VLAZ
  • 26,331
  • 9
  • 49
  • 67
  • `"details":"|Type:dry,|seedless:true|,|condition:New|"` So, does the position of the comma vary? I cannot figure out a pattern for this piece of string. Btw, your Python code has a typo: re2 does not have a closing quotation mark. – vahdet Aug 26 '20 at 14:56
  • That was a typo i have edited that, thanks @vahdet – Kunal Chaudhari Aug 26 '20 at 14:59
  • If you want json as output why using pandas? do you want output as json or dataframe? – deadshot Aug 26 '20 at 15:06
  • i am trying to convert it into csv and then json – Kunal Chaudhari Aug 26 '20 at 15:09
  • @KunalChaudhari, If one of the answers below fixes your issue, you should accept it (click the check mark next to the appropriate answer). That does two things. It lets everyone know your issue has been resolved to your satisfaction, and it gives the person that helps you credit for the assist. [See here](http://meta.stackexchange.com/a/5235) for a full explanation. – sushanth Aug 26 '20 at 16:14

2 Answers2

0

Try this,

for d in data:
    details = d.pop('details')
    d.update(dict(x.split(":") for x in details.split("|") if ":" in x))

print(data)

[{'color': 'Red',
  'condition': 'New',
  'fruit': 'Apple',
  'seedless': 'true',
  'size': 'Large'},
 {'Type': 'dry',
  'color': 'brown',
  'condition': 'New',
  'fruit': 'Almond',
  'seedless': 'true',
  'size': 'small'}]
sushanth
  • 8,275
  • 3
  • 17
  • 28
0

You can convert the (list of) dictionaries to a pandas data frame.

import pandas as pd

# data is a list of dictionaries
data = [{
    "fruit": "Apple",
    "size": "Large",
    "color": "Red",
    "details":"|seedless:true|,|condition:New|"

},
{
    "fruit": "Almond",
    "size": "small",
    "color": "brown",
    "details":"|Type:dry,|seedless:true|,|condition:New|"

}]

# convert to data frame
df = pd.DataFrame(data)

# remove '|' from details and convert to list
df['details'] = df['details'].str.replace(r'\|', '').str.split(',')

# explode list => one row for each element
df = df.explode('details')

# split details into name/value pair
df[['name', 'value']] = df['details'].str.split(':').apply(lambda x: pd.Series(x))

# drop details column
df = df.drop(columns='details')

print(df)

    fruit   size  color       name value
0   Apple  Large    Red   seedless  true
0   Apple  Large    Red  condition   New
1  Almond  small  brown       Type   dry
1  Almond  small  brown   seedless  true
1  Almond  small  brown  condition   New
jsmart
  • 2,921
  • 1
  • 6
  • 13