1

I was getting my data from a yml file, so I could get two lists, like this:

one of the lists, named list_products, contains the name of the product:

['ABCD'
'LTAP'
'DEFG'
'FFEE']

The other, named list_ids, contains a list of ids and sometimes the element can be a list:

[[100, 200],
 [3333],
 [1500,99, 870],
 [2]]

When working only with list_ids, I could get a dataframe, this is the code I used:

flat_list = [item for sublist in list_ids for item in sublist]
id_df = pd.DataFrame(flat_list,columns=['id'])

And this was the result:

id
100
200
3333
1500
99
870
2

Now, I want to have a dataframe with the product name as well. I want to get this:

id       name
100      'ABCD'
200      'ABCD'
3333     'LTAP'
1500     'DEFG'
99       'DEFG'
870      'DEFG'
2        'FFEE'
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
dekio
  • 810
  • 3
  • 16
  • 33

3 Answers3

4
  1. You can either pre-process the data using zip then build the DF

    names = ['ABCD', 'LTAP', 'DEFG', 'FFEE']
    list_ids = [[100, 200], [3333], [1500, 99, 870], [2]]
    
    flat_list = [(item, name) for sublist, name in zip(list_ids, names) for item in sublist]
    id_df = pd.DataFrame(flat_list, columns=['id', 'name'])
    

    Intermediate flat_list is

    flat_list > [(100, 'ABCD'), (200, 'ABCD'), (3333, 'LTAP'), ...
    

  1. Or build the df with raw data, then use explode

    id_df = pd.DataFrame({'id': list_ids, 'name': names}).explode('id')
    

    Intermediate pd.DataFrame({'id': list_ids, 'name': names} is

                 id  name
    0       [100, 200]  ABCD
    1           [3333]  LTAP
    2  [1500, 99, 870]  DEFG
    3              [2]  FFEE
    
azro
  • 53,056
  • 7
  • 34
  • 70
3

Try with explode

l1 = ['ABCD',
'LTAP',
'DEFG',
'FFEE']
l2 = [[100, 200],
 [3333],
 [1500,99, 870],
 [2]]
out = pd.DataFrame({'col1':l1,'col2':l2}).explode('col2')
out
   col1  col2
0  ABCD   100
0  ABCD   200
1  LTAP  3333
2  DEFG  1500
2  DEFG    99
2  DEFG   870
3  FFEE     2
BENY
  • 317,841
  • 20
  • 164
  • 234
1

If you don't want to use explode function as it is available only for versions > 0.25, you can use the below:

In [473]: d = {i : list_ids[c] for c, i in enumerate(named_list)}
In [474]: df_out = pd.DataFrame([(var, key) for (key, L) in d.items() for var in L], columns=['id', 'name'])

In [475] df_out
Out[475]: 
     id  name
0   100  ABCD
1   200  ABCD
2  3333  LTAP
3  1500  DEFG
4    99  DEFG
5   870  DEFG
6     2  FFEE
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58