I have a pandas dataframe containing an array of 4 values and I want to split it into 4 columns. In other words, I want to pivot it without having the prior column name.
I already followed the official pandas reshaping guide.
Input
df = pd.DataFrame({'bbox': [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]})
Expected
I want to split this dataframe into 4 columns like this:
pd.DataFrame({'x_min': [1, 2, 3], 'y_min': [4, 5, 6], 'width': [7, 8, 9], 'height': [10, 11, 12]})
What I tried
Attempt 1 : Pivoting ❌
First, I tried to explode it, giving a pattern and pivot it like this:
df = df['bbox'].explode().reset_index()
mapping = {
0: 'x_min',
1: 'y_min',
2: 'width',
3: 'height',
}
col_names = (df.index % 4).map(mapping)
df.pivot(columns=(df.index % 4).map(col_name), values='bbox')
Attempt 2: Numpy transposing ✅
Because this is a matrix and I want to extract the columns, I could easily do it with numpy.
df = pd.DataFrame({'bbox': [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]})
# Convert the df to a numpy matrix
df = df['bbox'].apply(pd.Series).to_numpy()
# Extract the columns
x_min = df[:, 0]
y_min = df[:, 1]
width = df[:, 2]
height = df[:, 3]
# Create a new dataframe
df = pd.DataFrame({'x_min': x_min, 'y_min': y_min, 'width': width, 'height': height})
Attempt 3: Using pandas stack
✅
The idea was to apply a Series and stack//unstack it. Then, reindexing the columns.
df = df.stack().apply(pd.Series).unstack()
df.columns = ['x_min', 'y_min', 'width', 'height']
Question
How can I efficiently split a pandas dataframe series containing arrays into x separate columns ?
What is the most efficient and straightforward way to do it?
I'm wondering if there's a better way to accomplish this task that I might have missed. Any suggestions or insights would be greatly appreciated.