0

I wrote the following code (not sure if this is the best approach), just know the data I have is divided into two separate lists, in the correct order. Z[0] is steps, and z[1] is the lists.

for i,z in enumerate(zip(steps,userids_list)):
print(z)

This results in the following tuple values:

 # SAMPLE
(('Step 1 string', [list of userid of that step]),
 ('Step 2 string', [list of userid of that step]),
 ('Step 3 string', [list of userid of that step]),
 ('Step n string', [list of userids of that step]))

My goal is to transform that style of data into the following pandas DataFrame.

Column 1  Column 2
Step 1     User id
Step 1     User id
Step 2     User id 
Step 2     User id
Step 3     User id
Step 3     User id

Unfortunately I couldn't find a way to transform the data into what I want. Any ideas on what I could try to do?

Peter Leimbigler
  • 10,775
  • 1
  • 23
  • 37
INGl0R1AM0R1
  • 1,532
  • 5
  • 16

2 Answers2

2

explode is perfect for this. Load your data into a dataframe and then explode the column containing the lists:

df = pd.DataFrame({
    'Column 1': Z[0],
    'Column 2': Z[1],
})

df = df.explode('Column 2')

For example:

steps = ['Step 1', 'Step 2', 'Step 3']
user_ids = [
    ['user a', 'user b'],
    ['user a', 'user b', 'user c'],
    ['user c'],
]

df = pd.DataFrame({
    'step': steps,
    'user_id': user_ids,
})

df = df.explode('user_id').reset_index(drop=True)
print(df)

Output:

     step user_id
0  Step 1  user a
1  Step 1  user b
2  Step 2  user a
3  Step 2  user b
4  Step 2  user c
5  Step 3  user c
0
data = (('Step 1 string', [list of userid of that step]),
('Step 2 string', [list of userid of that step]),
('Step 3 string', [list of userid of that step]),
('Step n string', [list of userids of that step]))

df = pd.DataFrame(data, columns=['Column 1', 'Column 2'])

This probably do the job