0

I have json data. I converted it into dataframe. But some columns are coming as list of dictionary. I want to change them into individual columns.

Example:

dataframe:

SourceRelation_00000000-0000-0000-0000-000000007041     FullName      StringAttribute_00000000-0000-0000-0000-000000003114
[{'SourceRelation_00000000-0000-0000-0000-000000007041_Target_DisplayName': 'Address Confidence Indicator', 'SourceRelation_00000000-0000-0000-0000-000000007041_Target_FullName': 'Address Confidence Indicator', 'SourceRelation_00000000-0000-0000-0000-000000007041_Target_Id': '60163bd0-a066-43f0-a9ba-354b453fbd63', 'SourceRelation_00000000-0000-0000-0000-000000007041_Id': '2e6e524d-a232-4822-a86f-902041b691ce'}]      A      [{'StringAttribute_00000000-0000-0000-0000-000000003114_Value': 'Active', 'StringAttribute_00000000-0000-0000-0000-000000003114_Id': 'a983071f-75aa-472b-bfd4-ce8b5e4e89fb'}]
[{'SourceRelation_00000000-0000-0000-0000-000000007041_Target_DisplayName': 'Address Delivery Confidence', 'SourceRelation_00000000-0000-0000-0000-000000007041_Target_FullName': 'Address Delivery Confidence', 'SourceRelation_00000000-0000-0000-0000-000000007041_Target_Id': 'dc9ff2dd-6345-4df7-98d5-a4099e6dee2f', 'SourceRelation_00000000-0000-0000-0000-000000007041_Id': '3844f2ff-f15c-4529-8075-8639350c18cb'}]      B      [{'StringAttribute_00000000-0000-0000-0000-000000003114_Value': 'HIGH', 'StringAttribute_00000000-0000-0000-0000-000000003114_Id': '5183f80b-8777-4417-b83d-fecb8f4e6c6b'}]

In final output I waant:

SourceRelation_00000000-0000-0000-0000-000000007041_Target_DisplayName    SourceRelation_00000000-0000-0000-0000-000000007041_Target_FullName    SourceRelation_00000000-0000-0000-0000-000000007041_Target_Id    SourceRelation_00000000-0000-0000-0000-000000007041_Id     FullName      StringAttribute_00000000-0000-0000-0000-000000003114_Value      StringAttribute_00000000-0000-0000-0000-000000003114_Id

Address Confidence Indicator     Address Confidence Indicator     60163bd0-a066-43f0-a9ba-354b453fbd63      2e6e524d-a232-4822-a86f-902041b691ce      A      Active     a983071f-75aa-472b-bfd4-ce8b5e4e89fb
Address Delivery Confidence      Address Delivery Confidence      dc9ff2dd-6345-4df7-98d5-a4099e6dee2f      3844f2ff-f15c-4529-8075-8639350c18cb      B      High       5183f80b-8777-4417-b83d-fecb8f4e6c6b   

Till Now I used below code-

with open (r'D:\Users\SPate233\Downloads\JGVCC\json output.txt',"r") as f:
    data = json.load(f)
print(data['aaData'][0])
df = pd.json_normalize(data['aaData'])

print(df.head())
df.to_csv(r'D:\Users\SPate233\Downloads\JGVCC\json_output.csv', index=False, sep=',')
         
amisha
  • 49
  • 2
  • 8
  • 1
    Please can you post precisely how your data is formatted. See https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – coffeinjunky Aug 08 '21 at 09:42

1 Answers1

0

try using explode()+agg():

cols=['SourceRelation_00000000-0000-0000-0000-000000007041',
      'StringAttribute_00000000-0000-0000-0000-000000003114']
out=(pd.DataFrame(df[cols].stack().explode().values.tolist())
       .agg(sorted,key=pd.isnull).dropna().join(df['FullName']))

OR

seperately explode them and then join:

s1=pd.DataFrame((df['SourceRelation_00000000-0000-0000-0000-000000007041'].explode().tolist()))
s2=pd.DataFrame((df['StringAttribute_00000000-0000-0000-0000-000000003114'].explode().tolist()))
out=s1.join(s2).join(df['FullName'])

Now if you print out you will get the desired output

dataframe used:

,SourceRelation_00000000-0000-0000-0000-000000007041,FullName,StringAttribute_00000000-0000-0000-0000-000000003114
0,"[{'SourceRelation_00000000-0000-0000-0000-000000007041_Target_DisplayName': 'Address Confidence Indicator', 'SourceRelation_00000000-0000-0000-0000-000000007041_Target_FullName': 'Address Confidence Indicator', 'SourceRelation_00000000-0000-0000-0000-000000007041_Target_Id': '60163bd0-a066-43f0-a9ba-354b453fbd63', 'SourceRelation_00000000-0000-0000-0000-000000007041_Id': '2e6e524d-a232-4822-a86f-902041b691ce'}]",A,"[{'StringAttribute_00000000-0000-0000-0000-000000003114_Value': 'Active', 'StringAttribute_00000000-0000-0000-0000-000000003114_Id': 'a983071f-75aa-472b-bfd4-ce8b5e4e89fb'}]"
1,"[{'SourceRelation_00000000-0000-0000-0000-000000007041_Target_DisplayName': 'Address Delivery Confidence', 'SourceRelation_00000000-0000-0000-0000-000000007041_Target_FullName': 'Address Delivery Confidence', 'SourceRelation_00000000-0000-0000-0000-000000007041_Target_Id': 'dc9ff2dd-6345-4df7-98d5-a4099e6dee2f', 'SourceRelation_00000000-0000-0000-0000-000000007041_Id': '3844f2ff-f15c-4529-8075-8639350c18cb'}]",B,"[{'StringAttribute_00000000-0000-0000-0000-000000003114_Value': 'HIGH', 'StringAttribute_00000000-0000-0000-0000-000000003114_Id': '5183f80b-8777-4417-b83d-fecb8f4e6c6b'}]"

df=pd.read_clipboard(',')
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41