-1

I have a json file that I splited into 2 column using pandas, and it looks like this :

 URL        report
0  121      [{'detected': True, 'name': 'Alfa', 'version' : '21.04'},{'detected': True, 'name': 'Alfa1'} ...
1  104      [{'detected': True, 'name': 'Beta', 'version' : '25.04.01'},{'detected': True, 'name': 'Beta3'}...
2  112      [{'detected': False, 'name': 'Alfa', 'version' : '21.04'},{'detected': True, 'name': 'Beta'}...

I want to make it look like this, removing the 'version':

URL        report_1                             report_2                              report_3
0  121     {'detected': True, 'name': 'Alfa'}   {'detected': True, 'name': 'Alfa1'} ...
1  104     {'detected': True, 'name': 'Beta'}   {'detected': True, 'name': 'Beta3'}...
2  112     {'detected': False, 'name': 'Alfa'}  {'detected': True, 'name': 'Beta'}...

The number of reports is not the same for every URL. Please if anyone could help me as I am struggling to find out how to split them. Many thanks in advance!

I honestly do not know what to try as I cannot seem to find a code that would work.

mozway
  • 194,879
  • 13
  • 39
  • 75
  • `df.explode('report')` then pivot by the two columns Q/A 10 [here](https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe). – Quang Hoang Mar 27 '23 at 18:41

1 Answers1

0

As you have a Series of lists, you need to loop to remove the keys:

drop = {'version'}

df['report'] = [[{k: v for k, v in d.items() if k not in drop}
                 for d in l] for l in df['report']]

Output:

   URL                                                                     report
0  121  [{'detected': True, 'name': 'Alfa'}, {'detected': True, 'name': 'Alfa1'}]
1  104  [{'detected': True, 'name': 'Beta'}, {'detected': True, 'name': 'Beta3'}]
2  112  [{'detected': False, 'name': 'Alfa'}, {'detected': True, 'name': 'Beta'}]

Then for the final output, pivot:

(df.explode('report').assign(col=lambda d: d.groupby(level=0).cumcount().add(1))
   .pivot(index='URL', columns='col', values='report')
   .add_prefix('report_').reset_index().rename_axis(columns=None)
)

Output:

   URL                             report_1                             report_2
0  104   {'detected': True, 'name': 'Beta'}  {'detected': True, 'name': 'Beta3'}
1  112  {'detected': False, 'name': 'Alfa'}   {'detected': True, 'name': 'Beta'}
2  121   {'detected': True, 'name': 'Alfa'}  {'detected': True, 'name': 'Alfa1'}

Used input:

df = pd.DataFrame({'URL': [121, 104, 112],
                   'report': [[{'detected': True, 'name': 'Alfa', 'version' : '21.04'},
                               {'detected': True, 'name': 'Alfa1'}],
                              [{'detected': True, 'name': 'Beta', 'version' : '25.04.01'},
                               {'detected': True, 'name': 'Beta3'}],
                              [{'detected': False, 'name': 'Alfa', 'version' : '21.04'},
                               {'detected': True, 'name': 'Beta'}]],
                  })
mozway
  • 194,879
  • 13
  • 39
  • 75