Flatten multiple dictionary/list columns into new columns and rows

Question

I started out with a deeply nested JSON file (sample of one of the records can be found here) and was able to turn it into the result below

   TopicId,CallID,"BudgetTopicActionMap,12345","BudgetTopicActionMap,12348","BudgetTopicActionMap,12350","BudgetTopicActionMap,17659"
    COMPET-09-2014,H2020-COMPET-2014,,"[{'action': 'COMPET-09-2014 - CSA Coordination and support action', 'plannedOpeningDate': '11 December 2013', 'deadlineModel': 'single-stage', 'deadlineDates': ['26 March 2014'], 'budgetYearMap': {'0': 53040000}, 'budgetTopicActionMap': {}}, {'action': 'COMPET-06-2014 - RIA Research and Innovation action', 'plannedOpeningDate': '11 December 2013', 'deadlineModel': 'single-stage', 'deadlineDates': ['26 March 2014'], 'budgetYearMap': {'0': 53040000}, 'budgetTopicActionMap': {}}, {'action': 'COMPET-05-2014 - CSA Coordination and support action', 'plannedOpeningDate': '11 December 2013', 'deadlineModel': 'single-stage', 'deadlineDates': ['26 March 2014'], 'budgetYearMap': {'0': 53040000}, 'budgetTopicActionMap': {}}]",,
    COMPET-11-2014,H2020-COMPET-2014,,"[{'action': 'COMPET-02-2014 - RIA Research and Innovation action', 'plannedOpeningDate': '11 December 2013', 'deadlineModel': 'single-stage', 'deadlineDates': ['26 March 2014'], 'budgetYearMap': {'0': 53040000}, 'budgetTopicActionMap': {}}, {'action': 'COMPET-11-2014 - CSA Coordination and support action', 'plannedOpeningDate': '11 December 2013', 'deadlineModel': 'single-stage', 'deadlineDates': ['26 March 2014'], 'budgetYearMap': {'0': 53040000}, 'budgetTopicActionMap': {}}, {'action': 'COMPET-01-2014 - RIA Research and Innovation action', 'plannedOpeningDate': '11 December 2013', 'deadlineModel': 'single-stage', 'deadlineDates': ['26 March 2014'], 'budgetYearMap': {'0': 53040000}, 'budgetTopicActionMap': {}}]",,
    COMPET-1-2016,H2020-COMPET-2016,,,"[{'action': 'COMPET-3-2016-a - IA Innovation action', 'plannedOpeningDate': '10 November 2015', 'deadlineModel': 'single-stage', 'deadlineDates': ['03 March 2016'], 'budgetYearMap': {'2016': 18000000}, 'budgetTopicActionMap': {}}]","[{'action': 'COMPET-2-2016 - RIA Research and Innovation action', 'plannedOpeningDate': '10 November 2015', 'deadlineModel': 'single-stage', 'deadlineDates': ['03 March 2016'], 'budgetYearMap': {'2016': 7000000}, 'budgetTopicActionMap': {}}]"
    COMPET-2-2016,H2020-COMPET-2016,,,"[{'action': 'COMPET-3-2016-a - IA Innovation action', 'plannedOpeningDate': '10 November 2015', 'deadlineModel': 'single-stage', 'deadlineDates': ['03 March 2016'], 'budgetYearMap': {'2016': 18000000}, 'budgetTopicActionMap': {}}]","[{'action': 'COMPET-2-2016 - RIA Research and Innovation action', 'plannedOpeningDate': '10 November 2015', 'deadlineModel': 'single-stage', 'deadlineDates': ['03 March 2016'], 'budgetYearMap': {'2016': 7000000}, 'budgetTopicActionMap': {}}]"

By dropping some columns and using

df = pd.json_normalize(list, sep = ",")

The problem is that this created +/- 200 "BudgetTopicActionMap" columns which, for each record, either contain a dictionary or are empty. I have not been able to flatten these dictionaries further, as I need to create a new column for every key, and sometimes one dictionary contains multiple identical keys. In addition, some records have multiple BudgetTopicAction columns that are non-empty, which would require a melt operation of some sort to turn it into long data instead. (I guess I'd need a melt operation for the non-unique keys per dictionary as well)

This would be my desired result:

identifier,callIdentifier,action,plannedOpeningDate,deadlineModel,deadlineDates,budgetYearMap,budgetTopicActionMap
COMPET-09-2014,H2020-COMPET-2014,COMPET-09-2014 - CSA Coordination and support action,2013-12-11,single-stage,['26 March 2014'],{'0': 53040000},{}
COMPET-09-2014,H2020-COMPET-2014,COMPET-06-2014 - RIA Research and Innovation action,2013-12-11,single-stage,['26 March 2014'],{'0': 53040000},{}
COMPET-09-2014,H2020-COMPET-2014,COMPET-05-2014 - CSA Coordination and support action,2013-12-11,single-stage,['26 March 2014'],{'0': 53040000},{}
COMPET-11-2014,H2020-COMPET-2014,COMPET-02-2014 - RIA Research and Innovation action,2013-12-11,single-stage,['26 March 2014'],{'0': 53040000},{}
COMPET-11-2014,H2020-COMPET-2014,COMPET-11-2014 - CSA Coordination and support action,2013-12-11,single-stage,['26 March 2014'],{'0': 53040000},{}
COMPET-11-2014,H2020-COMPET-2014,COMPET-01-2014 - RIA Research and Innovation action,2013-12-11,single-stage,['26 March 2014'],{'0': 53040000},{}
COMPET-3-2016-a,H2020-COMPET-2016,COMPET-3-2016-a - IA Innovation action,2015-11-10,'single-stage,['03 March 2016'],{'2016': 18000000},{}
COMPET-3-2016-a,H2020-COMPET-2016,COMPET-2-2016 - RIA Research and Innovation action,2015-11-10,'single-stage,['03 March 2016'],{'2016': 7000000},{}
COMPET-2-2016,H2020-COMPET-2016,COMPET-3-2016-a - IA Innovation action,2015-11-10,'single-stage,['03 March 2016'],{'2016': 18000000},{}
COMPET-2-2016,H2020-COMPET-2016,COMPET-2-2016 - RIA Research and Innovation action,2015-11-10,'single-stage,['03 March 2016'],{'2016': 7000000},{}

The dictionaries are flattened into their own columns (based on the keys), and where identical keys exist within a dictionary, additional rows are created. When, for a given identifier and callIdentifier, there are multiple non-zero budgetTopicAction columns, each column gets its own record.

I tried a lot of approaches mentioned here on SO (like this one or this one), but somehow I can't seem to combine them to get to the desired result. Either my dataframe does not change at all or it changes for the worse. It seems I need to both flatten and melt, for which I found several answers, but these were all catered toward one or two columns, not 200.(Or did not take non-unique keys per dictionary into account)

Any ideas on how to approach this?

Post your data as text rather than images. Also create a minimal example. — Vishnudev Krishnadas, Apr 07 '20 at 16:38
i am going through the original json data link u shared, as i believe it will be easier to work from there. could you point me to the exact keys u r interested in? — sammywemmy, Apr 08 '20 at 01:49
Sorry, correction: For each separate API call (one call only creates a single record): identifier, callIdentifier, and action and budgetYearMap (both within budgetTopicActionMap within budgetOverviewJSONItem) — AmyV, Apr 08 '20 at 08:42

Flatten multiple dictionary/list columns into new columns and rows

0 Answers0