I have a pandas dataframe that looks like this:
index | p1 | a1 | phase | file_number | e1 |
---|---|---|---|---|---|
388 | 19.288 | 21.630 | 0.0 | 0 | 0.0 |
389 | 40.910 | 71.489 | 1.0 | 0 | 0.0 |
390 | 31.310 | 43.952 | 2.0 | 0 | 0.0 |
391 | 28.420 | 30.250 | 3.0 | 0 | 0.0 |
392 | 17.940 | 22.000 | 0.0 | 1 | 0.0 |
393 | 38.020 | 68.750 | 1.0 | 1 | 0.0 |
394 | 31.230 | 48.352 | 2.0 | 1 | 1.0 |
395 | 26.902 | 29.880 | 3.0 | 1 | 0.0 |
We can create it using this code
d = {'p1': {388: 19.288,389: 40.91,390: 31.31,391: 28.42,392: 17.94,393: 38.02,394: 31.23,395: 26.902},
'a1': {388: 21.63,389: 71.489,390: 43.952,391: 30.25,392: 22.0,393: 68.75,394: 48.352,395: 29.88},
'phase': {388: 0.0,389: 1.0,390: 2.0,391: 3.0,392: 0.0,393: 1.0,394: 2.0,395: 3.0},
'file_number': {388: 0, 389: 0, 390: 0, 391: 0, 392: 1, 393: 1, 394: 1, 395: 1},
'e1': {388: 0.0,389: 0.0,390: 0.0,391: 0.0,392: 0.0,393: 1.0,394: 0.0,395: 0.0}}
df = pd.DataFrame(d)
For I want to transform this dataframe, so I have 1 row for every file_number. And transform it with respect to phase - basically collapse many rows into one for each file_number. Phase number will always be 0, 1, 2, 3. Final table should look like this:
p1_0 | p1_1 | p1_2 | p1_3 | a1_0 | p1_1 | a1_2 | a1_3 | e1_0 | e1_1 | e1_2 | e1_3 |
---|---|---|---|---|---|---|---|---|---|---|---|
19.288 | 40.910 | 31.310 | 28.420 | 21.630 | 71.489 | 43.952 | 30.250 | 0 | 0 | 0 | 0 |
17.940 | 38.020 | 31.230 | 26.902 | 22.000 | 68.750 | 48.352 | 29.880 | 0 | 0 | 1 | 0 |
Where suffix means p1_phase, a1_phase and so on.
I want to make it as fast as possible. Since my data is very large, I'd rather avoid looping.