First post...
Pardon the naivete but I'm struggling with what I thought was a simple problem based on this previous post Splitting dictionary/list inside a Pandas Column into Separate Columns
I'm trying to split a column in a dataframe from a CSV file into separate columns where the key becomes the column labels and the values are the indices.
Here are the first couple instances of said column.
captionTime
0 {'startTimeMs': 113488, 'endTimeMs': 116700}
1 {'startTimeMs': 116742, 'endTimeMs': 121080}
2 {'startTimeMs': 121121, 'endTimeMs': 122706}
3 {'startTimeMs': 128462, 'endTimeMs': 129838}
When I run:
df2 = df['captionTime'].apply(pd.Series)
it only returns a series instead of two columns labelled 'startTimeMs' & 'endTimeMs'.
0
0 {'startTimeMs': 113488, 'endTimeMs': 116700}
1 {'startTimeMs': 116742, 'endTimeMs': 121080}
2 {'startTimeMs': 121121, 'endTimeMs': 122706}
3 {'startTimeMs': 128462, 'endTimeMs': 129838}
UPDATE
I was able to grab the original API code a colleague used to export the CSV file.
CSV Snippet:
captionTime,contentType,language,region,timedTextType
"{'startTimeMs': 5000, 'endTimeMs': 6708}",None,id,{},SUBS
"{'startTimeMs': 15875, 'endTimeMs': 19125}",None,id,{},SUBS
"{'startTimeMs': 19500, 'endTimeMs': 22875}",None,id,{},SUBS
"{'startTimeMs': 27791, 'endTimeMs': 30291}",None,id,{},SUBS
Out of curiousity, I tried my initial method of splitting the series before writing the data and it worked no problem. I know that r.json() returns a dictionary, so I'm assuming-- when pandas reads the CSV it's reading the captionTime column as a string, not a dictionary.
Input:
r = session.post("{}{}".format(endpoint, api), headers=headers, data=json.dumps(body), params=params)
r.raise_for_status()
rDict = r.json()
results = rDict['results']
df = pd.DataFrame(results)
df2 = df['captionTime'].apply(pd.Series)
print(df2)
Output:
endTimeMs startTimeMs
0 6708 5000
1 19125 15875
2 22875 19500
3 30291 27791
I may not always be able to pull the data myself or may receive files from other colleagues, how do I go about cleaning the file to properly split the dictionary?