1

I am beginner of programming language, so it would be appreciated you help and support.

Here is DataFrame and one column' data is JSON type? of data.

ID, Name, Information
1234, xxxx, '{'age': 25, 'gender': 'male'}'
2234, yyyy, '{'age': 34, 'gender': 'female'}'
3234, zzzz, '{'age': 55, 'gender': 'male'}'

I would like to covert this DataFrame as below.

ID, Name, age, gender
1234, xxxx, 25, male
2234, yyyy, 34, female
3234, zzzz, 55, male

I found that ast.literal_eval() can convert str to dict type, but I have no idea how to write code of this issue.

Would you please give some example of a code which can solve this issue?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
montjeu
  • 39
  • 7

2 Answers2

0
  1. If third column is a JSON string, ' is not valid, it should be ", so we need to fix this.
  2. If the third column is a string representation of python dict, you can use eval to convert it.

A sample of code to split third column of type dict and merge into the original DataFrame:

data = [
  [1234, 'xxxx', "{'age': 25, 'gender': 'male'}"],
  [2234, 'yyyy', "{'age': 34, 'gender': 'female'}"],
  [3234, 'zzzz', "{'age': 55, 'gender': 'male'}"],
]

df = pd.DataFrame().from_dict(data)

df[2] = df[2].apply(lambda x: json.loads(x.replace("'", '"'))) # fix the data and convert to dict
merged = pd.concat([df[[0, 1]], df[2].apply(pd.Series)], axis=1)
Kassian Sun
  • 609
  • 4
  • 8
0

Given test.csv

ID,Name,Information
1234,xxxx,"{'age': 25, 'gender': 'male'}"
2234,yyyy,"{'age': 34, 'gender': 'female'}"
3234,zzzz,"{'age': 55, 'gender': 'male'}"
  • Read the file in with pd.read_csv and use the converters parameter with ast.literal_eval, which will convert the data in the Information column from a str type to dict type.
  • Use pd.json_normalize to unpack the dict with keys as column headers and values in the rows
  • .join the normalized columns with df
  • .drop the Information column
import pandas as pd
from ast import literal_eval

df = pd.read_csv('test.csv', converters={'Information': literal_eval})

df = df.join(pd.json_normalize(df.Information))

df.drop(columns=['Information'], inplace=True)

# display(df)
     ID  Name  age  gender
0  1234  xxxx   25    male
1  2234  yyyy   34  female
2  3234  zzzz   55    male

If the data is not from a csv file

import pandas as pd
from ast import literal_eval

data = {'ID': [1234, 2234, 3234],
        'Name': ['xxxx', 'yyyy', 'zzzz'],
        'Information': ["{'age': 25, 'gender': 'male'}", "{'age': 34, 'gender': 'female'}", "{'age': 55, 'gender': 'male'}"]}

df = pd.DataFrame(data)

# apply literal_eval to Information
df.Information = df.Information.apply(literal_eval)

# normalize the Information column and join to df
df = df.join(pd.json_normalize(df.Information))

# drop the Information column
df.drop(columns=['Information'], inplace=True)
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • 1
    Hi Tranton, Thank you for your answer with detail information. Those detail help me to learn python much easier. Problem solved in CSV way you posted first. Thank you. – montjeu Aug 14 '20 at 00:35