2

I have a dataset in this form:

df.head(2)

Employee_Name    EID        Details 
Lisa            FG546HJ    {"Summary": "Worked as a HR professional.", "Services_List": ["HR", "Marketing", "Hiring"],"On_payroll?": false}
Martin          H5644HH    {"Summary": "Worked as a UI Designer.", "Services_List": ["Frontend", "UI", "UX"],"On_payroll?": True}

As you can see, the 'Details' column is in json format...But I want the output to be in this form:

Employee_Name     EID      Summary                        Services_List         On_Payroll?
Lisa             FG546HJ   Worked as a HR professional.   HR,Marketing,Hiring    false
Martin           H5644HH   Worked as a UI Designer.       Frontend,UI,UX         True

Can someone please suggest on how to convert it.

Patrik
  • 75
  • 6
  • 1
    Does this answer your question? [Parsing a JSON string which was loaded from a CSV using Pandas](https://stackoverflow.com/questions/20680272/parsing-a-json-string-which-was-loaded-from-a-csv-using-pandas) – ftorre May 29 '23 at 15:02
  • @ftorre , it's a different case here has you can see this json has few arrays elements here like 'Services_List' which need to be converted as csv... – Patrik May 29 '23 at 15:17
  • Could you please provide input data in a reproducible form? please run `.head(2).to_dict(orient='records')` and put it as a dict for your questions. It will help alot and avoid to retype your data. – valentinmk May 29 '23 at 15:45
  • The question title "Unable to convert" implies that you tried but were unsuccessful. Please show us what you tried. – John Gordon May 29 '23 at 16:20

4 Answers4

0

You can covert the json data inside the rows to the dataframe and concat it to the original dataframe.

First, create the original dataframe as you mention.

data = {"Employee_Name": ['Lisa', 'Martin'], "EID": ['FG546HJ', 'H5644HH'], "Details": [{"Summary": "Worked as a HR professional.", "Services_List": ["HR", "Marketing", "Hiring"],"On_payroll?": False}, {"Summary": "Worked as a UI Designer.", "Services_List": ["Frontend", "UI", "UX"],"On_payroll?": True}]}
original_df = pd.DataFrame(data)

Then, we'll create new details dataframe by converting the each row json to the list of json.

details_df = pd.DataFrame(original_df['Details'].tolist())

Finally, we can concat the details_df to the original_df.

new_df = pd.concat([original_df, details_df], axis=1)
new_df.drop(columns='Details', inplace=True) # To drop the 'Details' json column

Additionally, here is the step to handle the list value of the json row. This line will convert list value to the string.

new_df['Services_List'] = new_df.apply(lambda x: ', '.join(x['Services_List']), axis=1)
Chuu
  • 176
  • 7
  • It's giving incorrect results... – Patrik May 29 '23 at 15:30
  • I think it's easier to add one manual step for the array column like `new_df['Services_List'] = new_df.apply(lambda x: ', '.join(x['Services_List']), axis=1)` – Chuu May 29 '23 at 15:35
0

json_normalize helps in such cases.

https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html


In [1]: import pandas as pd

In [2]: df = pd.DataFrame(
   ...:     [['Lisa', 'FG546HJ', {"Summary": "Worked as a HR professional.", "Services_List": ["HR", "Marketing", "Hiring"],"On_payroll?": False}],
   ...:      ['Martin', 'H5644HH', {"Summary": "Worked as a UI Designer.", "Services_List": ["Frontend", "UI", "UX"],"On_payroll?": True}]
   ...:     ],
   ...:     columns=['Employee_Name', 'EID', 'Details']
   ...: )

In [3]: df.drop(columns='Details').join(pd.json_normalize(df.Details))
Out[3]: 
  Employee_Name      EID                       Summary            Services_List  On_payroll?
0          Lisa  FG546HJ  Worked as a HR professional.  [HR, Marketing, Hiring]        False
1        Martin  H5644HH      Worked as a UI Designer.       [Frontend, UI, UX]         True
valentinmk
  • 597
  • 7
  • 18
0

Try:

import json

# skip this step if values in Details are already parsed:
df['Details'] = df['Details'].apply(json.loads)

df = pd.concat([df, df.pop('Details').apply(pd.Series)], axis=1)
print(df)

Prints:

  Employee_Name      EID                       Summary            Services_List  On_payroll?
0          Lisa  FG546HJ  Worked as a HR professional.  [HR, Marketing, Hiring]        False
1        Martin  H5644HH      Worked as a UI Designer.       [Frontend, UI, UX]         True
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0
import pandas as pd
import json

data = [
    {
        'Employee_Name': 'Lisa',
        'EID': 'FG546HJ',
        'Details': '{"Summary": "Worked as a HR professional.", "Services_List": ["HR", "Marketing", "Hiring"], "On_payroll?": false}'
    },
    {
        'Employee_Name': 'Martin',
        'EID': 'H5644HH',
        'Details': '{"Summary": "Worked as a UI Designer.", "Services_List": ["Frontend", "UI", "UX"], "On_payroll?": true}'
    }
]

# Create dataframe
df = pd.DataFrame(data)

# Parse the JSON in 'Details' column
df['Details'] = df['Details'].apply(json.loads)

# Expand the dictionary values into separate columns
df = pd.concat([df.drop('Details', axis=1), df['Details'].apply(pd.Series)], axis=1)

# Rename columns
df = df.rename(columns={'Summary': 'Summary', 'Services_List': 'Services_List', 'On_payroll?': 'On_Payroll?'})

# Print the resulting dataframe
df.head()
Anonymous
  • 835
  • 1
  • 5
  • 21