0

I have a pandas dataframe containing an array of 4 values and I want to split it into 4 columns. In other words, I want to pivot it without having the prior column name.

I already followed the official pandas reshaping guide.

Input

df = pd.DataFrame({'bbox': [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]})

Input

Expected

I want to split this dataframe into 4 columns like this:

pd.DataFrame({'x_min': [1, 2, 3], 'y_min': [4, 5, 6], 'width': [7, 8, 9], 'height': [10, 11, 12]})

Output

What I tried

Attempt 1 : Pivoting ❌

First, I tried to explode it, giving a pattern and pivot it like this:

df = df['bbox'].explode().reset_index()
mapping = {
    0: 'x_min',
    1: 'y_min',
    2: 'width',
    3: 'height',
}
col_names = (df.index % 4).map(mapping)
df.pivot(columns=(df.index % 4).map(col_name), values='bbox')

Attempt 2: Numpy transposing ✅

Because this is a matrix and I want to extract the columns, I could easily do it with numpy.

df = pd.DataFrame({'bbox': [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]})

# Convert the df to a numpy matrix
df = df['bbox'].apply(pd.Series).to_numpy()

# Extract the columns
x_min = df[:, 0]
y_min = df[:, 1]
width = df[:, 2]
height = df[:, 3]

# Create a new dataframe
df = pd.DataFrame({'x_min': x_min, 'y_min': y_min, 'width': width, 'height': height})

Attempt 3: Using pandas stack

The idea was to apply a Series and stack//unstack it. Then, reindexing the columns.

df = df.stack().apply(pd.Series).unstack()
df.columns = ['x_min', 'y_min', 'width', 'height']

Question

How can I efficiently split a pandas dataframe series containing arrays into x separate columns ?

What is the most efficient and straightforward way to do it?

I'm wondering if there's a better way to accomplish this task that I might have missed. Any suggestions or insights would be greatly appreciated.

Olivier D'Ancona
  • 779
  • 2
  • 14
  • 30

0 Answers0