Pandas transform rows to columns

Question

I have a pandas dataframe that looks like this:

index	p1	a1	phase	file_number	e1
388	19.288	21.630	0.0	0	0.0
389	40.910	71.489	1.0	0	0.0
390	31.310	43.952	2.0	0	0.0
391	28.420	30.250	3.0	0	0.0
392	17.940	22.000	0.0	1	0.0
393	38.020	68.750	1.0	1	0.0
394	31.230	48.352	2.0	1	1.0
395	26.902	29.880	3.0	1	0.0

We can create it using this code

d = {'p1': {388: 19.288,389: 40.91,390: 31.31,391: 28.42,392: 17.94,393: 38.02,394: 31.23,395: 26.902},
     'a1': {388: 21.63,389: 71.489,390: 43.952,391: 30.25,392: 22.0,393: 68.75,394: 48.352,395: 29.88},
     'phase': {388: 0.0,389: 1.0,390: 2.0,391: 3.0,392: 0.0,393: 1.0,394: 2.0,395: 3.0},
     'file_number': {388: 0, 389: 0, 390: 0, 391: 0, 392: 1, 393: 1, 394: 1, 395: 1},
     'e1': {388: 0.0,389: 0.0,390: 0.0,391: 0.0,392: 0.0,393: 1.0,394: 0.0,395: 0.0}}

df = pd.DataFrame(d)

For I want to transform this dataframe, so I have 1 row for every file_number. And transform it with respect to phase - basically collapse many rows into one for each file_number. Phase number will always be 0, 1, 2, 3. Final table should look like this:

p1_0	p1_1	p1_2	p1_3	a1_0	p1_1	a1_2	a1_3	e1_0	e1_1	e1_2	e1_3
19.288	40.910	31.310	28.420	21.630	71.489	43.952	30.250	0	0	0	0
17.940	38.020	31.230	26.902	22.000	68.750	48.352	29.880	0	0	1	0

Where suffix means p1_phase, a1_phase and so on.

I want to make it as fast as possible. Since my data is very large, I'd rather avoid looping.

there are numerous answers on Stack Overflow for this. It's called pivoting in pandas. Look at `stack`, `pivot` or `pivot_table` — sammywemmy, Jul 14 '21 at 11:40
Read Q/A 10 in [this question](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe). — Quang Hoang, Jul 14 '21 at 11:47

score 1 · Accepted Answer · answered Jul 14 '21 at 14:42

d = {'p1': {388: 19.288,389: 40.91,390: 31.31,391: 28.42,392: 17.94,393: 38.02,394: 31.23,395: 26.902},
     'a1': {388: 21.63,389: 71.489,390: 43.952,391: 30.25,392: 22.0,393: 68.75,394: 48.352,395: 29.88},
     'phase': {388: 0.0,389: 1.0,390: 2.0,391: 3.0,392: 0.0,393: 1.0,394: 2.0,395: 3.0},
     'file_number': {388: 0, 389: 0, 390: 0, 391: 0, 392: 1, 393: 1, 394: 1, 395: 1},
     'e1': {388: 0.0,389: 0.0,390: 0.0,391: 0.0,392: 0.0,393: 1.0,394: 0.0,395: 0.0}}

df = pd.DataFrame(d)
# pivot the data
pivoted = df.pivot(index='file_number', columns='phase')
# flatten the columns
pivoted.columns = [f'{col[0]}_{int(col[1])}'  for col in pivoted.columns.values]

After this pivoted is a Dataframe with the shape you desired.

Basically a combination of these two questions:

Thanks, this one worked! The way that I found was res = pd.pivot_table(df, index=['file_number', 'phase']).unstack().reset_index(drop=True) res.columns = [f'{col[0]}_{int(col[1])}' for col in res.columns] but your solution is cleaner — Evangelista01, Jul 14 '21 at 15:01
:) That's about the same. If file_number is unimportant, it makes sense to re-index and drop it. — Chris Wesseling, Jul 14 '21 at 15:26

Pandas transform rows to columns

1 Answers1