Transforming the data from a single column into multiple columns in pandas

Question

Im working with a database that contains the following structure

ID	Test	Result
12a	Test1	Normal
12a	Test3	678.2
3s5	Test2	<1
3s5	Test1	Normal
8r5	Test4	Rectangular(3+)

As you can see, the different tests have different result formats, and not every ID have all the tests.

I would like to transform this into something as follows:

ID	Test1	Test2	Test3	Test4
12a	Normal	NA	678.2	NA
3s5	Normal	<1	NA	NA
8r5	NA	NA	NA	Rectangular(3+)

I've tried with pandas.pivot but encountered the following error

df.pivot(index="ID",columns="Test",values="Result")

ValueError: Index contains duplicate entries, cannot reshape

Changing the Index to ID does not work, neither resetting index.

Any help will be greatly appreciated!

Does this answer your question? [How can I pivot a dataframe?](https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe) or https://stackoverflow.com/questions/46528599/pandas-pivot-produces-valueerror-index-contains-duplicate-entries-cannot-resh — Chris, Sep 22 '21 at 22:50

score 2 · Accepted Answer · answered Sep 22 '21 at 23:33

2

You can try using .pivot_table() instead of .pivot(), as follows:

df.pivot_table(index="ID", columns="Test", values="Result", aggfunc='first')

Result:

Test   Test1 Test2  Test3            Test4
ID                                        
12a   Normal   NaN  678.2              NaN
3s5   Normal    <1    NaN              NaN
8r5      NaN   NaN    NaN  Rectangular(3+)

answered Sep 22 '21 at 23:33

SeaBean

22,547
3
13
25

Thanks! I had tried using `pivot_table` but was missing `aggfunc="first"` argument. Works like a charm! – Mario_B Sep 22 '21 at 23:43
2

@Mario_B Without the `aggfunc="first"` argument, the default is `aggfunc='mean'` which requires numeric values to work properly. As your data contains non-numeric data, using `aggfunc="first"` is a trick to make `pivot_table()` work for this kind of data. – SeaBean Sep 23 '21 at 00:09

score 1 · Answer 2 · answered Sep 22 '21 at 23:15

Here is a way of doing it :

df = {'ID': ['12a', '12a', '3s5', '3s5', '8r5'],
  'Test': ['Test1', 'Test3', 'Test2', 'Test1', 'Test4'],
  'Result': ['Normal', '678.2', '<1', 'Normal', 'Rectangular(3+)']}

df=df.groupby(['ID', 'Test'])['Result'].sum().unstack(fill_value="NA")

Transforming the data from a single column into multiple columns in pandas

2 Answers2