I have a dataframe which has a column of list of dictionaries, which looks like
[{'first_open_time': {'int_value': '1652796000000', 'set_timestamp_micros': '1652792823456000'}}, {'User_dedication': {'string_value': '1', 'set_timestamp_micros': '1653137417352000'}}, {'User_activity': {'string_value': '1', 'set_timestamp_micros': '1653136561498000'}}, {'Minutes_in_app': {'string_value': '60_300', 'set_timestamp_micros': '1653137417352000'}}, {'ga_session_number': {'int_value': '10', 'set_timestamp_micros': '1653136552555000'}}, {'Paying_user': {'string_value': '0', 'set_timestamp_micros': '1653136561498000'}}, {'ga_session_id': {'int_value': '1653136552', 'set_timestamp_micros': '1653136552555000'}}]
[{'User_dedication': {'string_value': '1', 'set_timestamp_micros': '1653137166688000'}}, {'User_activity': {'string_value': '1', 'set_timestamp_micros': '1653136561498000'}}, {'Minutes_in_app': {'string_value': '60_300', 'set_timestamp_micros': '1653137166688000'}}, {'Paying_user': {'string_value': '0', 'set_timestamp_micros': '1653136561498000'}}, {'ga_session_id': {'int_value': '1653136552', 'set_timestamp_micros': '1653136552555000'}}, {'ga_session_number': {'int_value': '10', 'set_timestamp_micros': '1653136552555000'}}, {'first_open_time': {'int_value': '1652796000000', 'set_timestamp_micros': '1652792823456000'}}]
[{'Minutes_in_app': {'string_value': '60_300', 'set_timestamp_micros': '1653137288213000'}}, {'Paying_user': {'string_value': '0', 'set_timestamp_micros': '1653136561498000'}}, {'first_open_time': {'int_value': '1652796000000', 'set_timestamp_micros': '1652792823456000'}}, {'User_dedication': {'string_value': '1', 'set_timestamp_micros': '1653137288213000'}}, {'User_activity': {'string_value': '1', 'set_timestamp_micros': '1653136561498000'}}, {'ga_session_number': {'int_value': '10', 'set_timestamp_micros': '1653136552555000'}}, {'ga_session_id': {'int_value': '1653136552', 'set_timestamp_micros': '1653136552555000'}}]
[{'first_open_time': {'int_value': '1653195600000', 'set_timestamp_micros': '1653193960416000'}}]
[{'ga_session_number': {'int_value': '3', 'set_timestamp_micros': '1653165977727000'}}, {'User_activity': {'string_value': '1_10', 'set_timestamp_micros': '1653109414730000'}}, {'Minutes_in_app': {'string_value': '1_10', 'set_timestamp_micros': '1653109414735000'}}, {'first_open_time': {'int_value': '1653102000000', 'set_timestamp_micros': '1653098744032000'}}, {'User_dedication': {'string_value': '1', 'set_timestamp_micros': '1653109414734000'}}, {'ga_session_id': {'int_value': '1653165977', 'set_timestamp_micros': '1653165977727000'}}]
And I expected that json_normalize would put data into columns like
first_open_time.int_value, first_open_time.set_timestamp_micros, User_dedication.string_value, User_dedication.set_timestamp_micros, etc.
Instead it just split it into 7 columns with dictionaries:
{'first_open_time.int_value': '1652796000000', 'first_open_time.set_timestamp_micros': '1652792823456000'} {'User_dedication.string_value': '1', 'User_dedication.set_timestamp_micros': '1653137417352000'} {'User_activity.string_value': '1', 'User_activity.set_timestamp_micros': '1653136561498000'}
That look almost what I need, but still dictionaries. And some rows are out of order, like in the first example.
I tried to specify Meta (as I understood from some manuals)
df3 = pd.json_normalize(df3,
meta=[['first_open_time', 'int_value'], ['first_open_time', 'set_timestamp_micros'],
['User_dedication', 'string_value'], ['User_dedication', 'set_timestamp_micros'],
['User_activity', 'string_value'], ['User_activity', 'set_timestamp_micros'],
['Minutes_in_app', 'string_value'], ['Minutes_in_app', 'set_timestamp_micros'],
['ga_session_number', 'int_value'], ['ga_session_number', 'set_timestamp_micros'],
['Paying_user', 'string_value'], ['Paying_user', 'set_timestamp_micros'],
['ga_session_id', 'int_value'], ['ga_session_id', 'set_timestamp_micros']])
But it gives AttributeError: 'list' object has no attribute 'values'.
Maybe some of the problems are from the fact, that dictionaries are out of order in some rows, and some rows have lesser amount of dictionaries. That's how Bigquery put events.
Is there any way to solve that problem? Maybe to sort all rows of dictionaries, so that all of them will be in order or specify each column and which value should go there?