How can I convert two lists into a dataframe, having one as a list of lists?

Question

I was getting my data from a yml file, so I could get two lists, like this:

one of the lists, named list_products, contains the name of the product:

['ABCD'
'LTAP'
'DEFG'
'FFEE']

The other, named list_ids, contains a list of ids and sometimes the element can be a list:

[[100, 200],
 [3333],
 [1500,99, 870],
 [2]]

When working only with list_ids, I could get a dataframe, this is the code I used:

flat_list = [item for sublist in list_ids for item in sublist]
id_df = pd.DataFrame(flat_list,columns=['id'])

And this was the result:

Now, I want to have a dataframe with the product name as well. I want to get this:

id       name
100      'ABCD'
200      'ABCD'
3333     'LTAP'
1500     'DEFG'
99       'DEFG'
870      'DEFG'
2        'FFEE'

score 4 · Accepted Answer · answered Sep 22 '20 at 19:07

You can either pre-process the data using zip then build the DF

names = ['ABCD', 'LTAP', 'DEFG', 'FFEE']
list_ids = [[100, 200], [3333], [1500, 99, 870], [2]]

flat_list = [(item, name) for sublist, name in zip(list_ids, names) for item in sublist]
id_df = pd.DataFrame(flat_list, columns=['id', 'name'])

Intermediate flat_list is

flat_list > [(100, 'ABCD'), (200, 'ABCD'), (3333, 'LTAP'), ...

Or build the df with raw data, then use explode

id_df = pd.DataFrame({'id': list_ids, 'name': names}).explode('id')

Intermediate pd.DataFrame({'id': list_ids, 'name': names} is

             id  name
0       [100, 200]  ABCD
1           [3333]  LTAP
2  [1500, 99, 870]  DEFG
3              [2]  FFEE

score 3 · Answer 2 · answered Sep 22 '20 at 19:06

Try with explode

l1 = ['ABCD',
'LTAP',
'DEFG',
'FFEE']
l2 = [[100, 200],
 [3333],
 [1500,99, 870],
 [2]]
out = pd.DataFrame({'col1':l1,'col2':l2}).explode('col2')
out
   col1  col2
0  ABCD   100
0  ABCD   200
1  LTAP  3333
2  DEFG  1500
2  DEFG    99
2  DEFG   870
3  FFEE     2

Mayank Porwal · Answer 3 · 2020-09-22T19:27:10.717

If you don't want to use explode function as it is available only for versions > 0.25, you can use the below:

In [473]: d = {i : list_ids[c] for c, i in enumerate(named_list)}
In [474]: df_out = pd.DataFrame([(var, key) for (key, L) in d.items() for var in L], columns=['id', 'name'])

In [475] df_out
Out[475]: 
     id  name
0   100  ABCD
1   200  ABCD
2  3333  LTAP
3  1500  DEFG
4    99  DEFG
5   870  DEFG
6     2  FFEE

How can I convert two lists into a dataframe, having one as a list of lists?

3 Answers3