I have data in a pandas dataframe and I'm trying to separate and extract data out of a specific column col
. The values in col
are all lists of various sizes that store 4-value tuples (previous 4 key-value dictionaries). These values are always in the same relative order for the tuple.
For each of those tuples, I'd like to have a separate row in the final dataframe as well as having the respective value from the tuple stored in a new column.
The DataFrame df
looks like this:
ID col
A [(123, 456, 111, False), (124, 456, 111, true), (125, 456, 111, False)]
B []
C [(123, 555, 333, True)]
I need to split col
into four columns but also lengthen the dataframe for each record so each tuple has its own row in df2
. DataFrame d2
should look like this:
ID col1 col2 col3 col4
A 123 456 111 False
A 124 456 111 True
A 125 456 111 False
B None None None None
C 123 555 333 True
I have some sort of workaround loop-based code that seems to get the job done but I'd like to find a better and more efficient way that I can run on a huge data set. Perhaps using vectorization or NumPy
if possible. Here's what I have so far:
import pandas as pd
df = pd.DataFrame({'ID': ['A', 'B', 'C'],
'col': [[('123', '456', '111', False),
('124', '456', '111', True),
('125', '456', '111', False)],
[],
[('123', '555', '333', True)]]
})
final_rows = []
for index, row in df.iterrows():
if not row.col: # if list is empty
final_rows.append(row.ID)
for tup in row.col:
new_row = [row.ID]
vals = list(tup)
new_row.extend(vals)
final_rows.append(new_row)
df2 = pd.DataFrame(final_rows, columns=['ID', 'col1', 'col2', 'col3', 'col4'])