1

Consider the following example

data1 = [{'type': 'one', 'delta': '1', 'time': '2019'}, {'type': 'two', 'delta': '1', 'time': '2018'}]
data2 = [{'type': 'one', 'delta': '1', 'time': '2013'}, {'type': 'two', 'delta': '1', 'time': '2012'}]


dftest = pd.DataFrame({'weirdjson' : [data1, data2]})
dftest['normalcol'] = 1

dftest

Out[79]: 
                                                                                        weirdjson  normalcol  time_type_one  time_type_two
0  [{'type': 'one', 'delta': '1', 'time': '2019'}, {'type': 'two', 'delta': '1', 'time': '2018'}]          1           2019           2018
1  [{'type': 'one', 'delta': '1', 'time': '2013'}, {'type': 'two', 'delta': '1', 'time': '2012'}]          1           2013           2012

Essentially, I would like to create two columns time_type_one and time_type_two that each contain their corresponding time value (for the first row: 2019 for type one and 2018 for type two).

How can I do that in Pandas? I have many rows so I am looking for something very efficient. Thanks!

ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
  • I did not downvote, but I think people downvoted since there's no attempt shown. Plus if we copy this dataframe, it will be copied as string column, instead of list of dictionary's. So it would be better to include your sample dataframe as copyable code with `pd.DataFrame(..)` – Erfan Dec 29 '19 at 20:27
  • good point. let me change that – ℕʘʘḆḽḘ Dec 29 '19 at 20:27
  • Since youre at it. It's also best to include an expected output in the form of a dataframe, this way people can visually see what you try to do. – Erfan Dec 29 '19 at 20:28
  • I think you can take a look at this: https://stackoverflow.com/questions/39899005/how-to-flatten-a-pandas-dataframe-with-some-columns-as-json – E. Zeytinci Dec 29 '19 at 21:32
  • question updated! – ℕʘʘḆḽḘ Dec 29 '19 at 21:33

3 Answers3

1

Try this:

import json
import pandas as pd

data = [{'normalcol':1, 'weirdjsoncol':'[{"type": "one", "delta": "1", "time": "2019"}, {"type": "two", "delta": "1", "time": "2018"}]'}, {'normalcol':2, 'weirdjsoncol':'[{"type": "two", "delta": "1", "time": "2017"}, {"type": "one", "delta": "1", "time": "2013"}]'}]

df = pd.DataFrame(data)

df['time_type_one'] = df['weirdjsoncol'].apply(lambda x: next((i for i in json.loads(x) if i["type"] == "one"), None)["time"])

df['time_type_two'] = df['weirdjsoncol'].apply(lambda x: next((i for i in json.loads(x) if i["type"] == "two"), None)["time"])

Zeeshan
  • 1,078
  • 9
  • 14
1

You can try this:

df_new = pd.DataFrame().append([x[y] for x in dftest.weirdjson for y in range(len(dftest.weirdjson))])
df_new = df_new.pivot(columns='type', values=['delta', 'time']).apply(lambda x: pd.Series(x.dropna().values)) 
df_new.columns = ['_'.join(col) for col in df_new.columns.values] 

  delta_one delta_two time_one time_two
0         1         1     2019     2018
1         1         1     2013     2017

oppressionslayer
  • 6,942
  • 2
  • 7
  • 24
  • I updated the second line, so should work now with your dftest, i forgot the df_new = – oppressionslayer Dec 29 '19 at 22:03
  • I'm not sure how fast pivot is, i can do it another way without pivot, where i aggregrate the values as lists and expand them out, but it's a few more steps, but with shorter lines – oppressionslayer Dec 29 '19 at 22:05
1

You may use explode, and construct a new dataframe and unstack type to columns as follows:

s = dftest.weirdjson.explode()
df_new = (pd.DataFrame({'type': s.str['type'], 'time': s.str['time']}) 
            .set_index('type', append=True).time.unstack().add_prefix('time_type_'))

Out[461]:
type time_type_one time_type_two
0             2019          2018
1             2013          2012
Andy L.
  • 24,909
  • 4
  • 17
  • 29