Extract values from Python dataframe

Question

I have a Pandas dataframe in the following format:

ID|Date|Values
1234|2021-01-01|{"Reason":"Change", "New Value":"Segment 2", "Old Value":"Segment 1"}

I'd like to parse the values column and create a new dataframe:

ID|Date|Old|New

The order of the values is sometimes different. How can I extract these values in Python?

Does this answer your question? [Split / Explode a column of dictionaries into separate columns with pandas](https://stackoverflow.com/questions/38231591/split-explode-a-column-of-dictionaries-into-separate-columns-with-pandas) — David Kaftan, Jun 02 '21 at 21:36

score 0 · Answer 1 · answered Jun 02 '21 at 20:31

0

df[["Old", "New"]] = df["Values"].apply(lambda d: (d["Old Value"], d["New Value"])) \
                                 .tolist()
df = df.drop(columns="Values")

Another method:

import operator

old = operator.itemgetter("Old Value")
new = operator.itemgetter("New Value")

df["Old"] = df["Values"].apply(old)
df["New"] = df["Values"].apply(new)
df = df.drop(columns="Values")

answered Jun 02 '21 at 20:31

Corralien

109,409
8
28
52

The column is an object so I get "TypeError: string indices must be integers" – Mike Sarnoski Jun 02 '21 at 23:52
How to create your sample dataframe: `pd.DataFrame(...)`? Please can you add `df.info()` and `df.head()` output to your post. – Corralien Jun 03 '21 at 05:55

score 0 · Answer 2 · answered Jun 02 '21 at 21:26

You can use pd.DataFrame() to extract the dictionary into columns. Take the 2 related columns from the resulting new dataframe and join it with the original dataframe using .join() and drop the original Values column by .drop().

df_new = df.drop('Values', axis=1).join(pd.DataFrame(df['Values'].tolist())[['Old Value', 'New Value']])

Note that making use of pd.DataFrame() for extracting dictionary into columns is the fastest among various ways of doing the same task. It is considerably faster than using .apply() with lambda function.

Ressult:

print(df_new)


     ID        Date  Old Value  New Value
0  1234  2021-01-01  Segment 1  Segment 2

I get the following: "None of [Index(['Old Value', 'New Value'], dtype='object')] are in the [columns]" — Mike Sarnoski, Jun 02 '21 at 23:53

Extract values from Python dataframe

2 Answers2