You can try:
df[["ClientToken"]].join(df.Data.apply(lambda x: pd.Series(json.loads(x[1:-1]))))
Explanations:
- Select the
Data
column and apply the following steps:
- Because the "
Data
" content is wrapped in a list and this is a string, we can remove []
manually using x[1:-1]
(remove first and last character).
- Since the
"Data"
column is a string
and we actually want a JSON
, we need to convert it. One solution is to use the json.loads()
function from the json
module. The code becomes json.loads(x[1:-1])
- Then, convert the
dict
to a pd.Series
using pd.Series(json.loads(x[1:-1]))
- Add these new columns to the existing dataframe using
join
. Also, you will notice I used double []
to select the "ClientToken"
column as a dataframe.
Code + illustration:
import pandas as pd
import json
# step 1.1
print(df.Data.apply(lambda x: x[1:-1]))
# 0 {"summaryId":"4814223456","duration":952,"star...
# 1 {"summaryId":"4814239586","duration":132,"star...
# Name: Data, dtype: object
# step 1.2
print(df.Data.apply(lambda x: json.loads(x[1:-1])))
# 0 {'summaryId': '4814223456', 'duration': 952, '...
# 1 {'summaryId': '4814239586', 'duration': 132, '...
# Name: Data, dtype: object
# step 1.3
print(df.Data.apply(lambda x: pd.Series(json.loads(x[1:-1]))))
# summaryId duration startTime
# 0 4814223456 952 1587442919
# 1 4814239586 132 1587443876
# step 2
print(df[["ClientToken"]].join(df.Data.apply(lambda x: pd.Series(json.loads(x[1:-1])))))
# ClientToken summaryId duration startTime
# 0 7a9ee887-8a09-ff9592e08245 4814223456 952 1587442919
# 1 bac49563-2cf0-cb08e69daa48 4814239586 132 1587443876
Edit 1:
As it seems that there are some rows where the list
in Data
has multiple dicts
, you can try:
df[["ClientToken"]].join(df.Data.apply(lambda x: [pd.Series(y)
for y in json.loads(x)]) \
.explode() \
.apply(pd.Series))