How to transform combinations of values in columns into individual columns?

Question

I have a dataset (df), that looks like this:

Date	ID	County Name	State	State Name	Product Name	Type of Transaction	QTY
202105	10001	Los Angeles	CA	California	Shoes	Entry	630
202012	10002	Houston	TX	Texas	Keyboard	Exit	5493
202001	11684	Chicago	IL	Illionis	Phone	Disposal	220
202107	12005	New York	NY	New York	Phone	Entry	302
...	...	...	...	...	...	...	...
202111	14990	Orlando	FL	Florida	Shoes	Exit	201

For every county, there are multiple entries for different Products, types of transactions, and at different dates, but not all counties have the same number of entries and they don't follow the same dates.

I want to recreate this dataset, such that: 1 - All counties have the same start and end dates, and for those dates where the county does not record entries, I want this entry to be recorded as NaN. 2 - The product names and their types are their own columns.

Essentially, this is how the dataset needs to look:

Date	ID	County Name	State	State Name	Shoes, Entry	Shoes, Exit	Shoes, Disposal	Phones, Entry	Phones, Exit	Phones, Disposal	Keyboard, Entry	Keyboard, Exit	Keyboard, Disposal
202105	10001	Los Angeles	CA	California	594	694	5660	33299	1110	5659	4559	3223	56889
202012	10002	Houston	TX	Texas	3420	4439	549	2110	5669	2245	39294	3345	556
202001	11684	Chicago	IL	Illionis	55432	4439	329	21190	4320	455	34059	44556	5677
202107	12005	New York	NY	New York	34556	2204	4329	11193	22345	43221	1544	3467	22450
...	...	...	...	...	...	...	...	...	...	...	...	...	...
202111	14990	Orlando	FL	Florida	54543	23059	3290	21394	34335	59660	NaN	NaN	NaN

Under the example, you can see how Florida does not record certain transactions. I would like to add the NaN such that the dataframe looks like this. I appreciate all the help!

Please use correct formatting for tables in Markdown. Please read [this post on how to use tables in posts](https://meta.stackoverflow.com/questions/277716/how-can-i-create-a-table-in-a-post) — akaAbdullahMateen, May 05 '22 at 03:15
@akaAbdullahMateen I tried using tables formatting but I think the table was too large. I went ahead and updated the tables as images. — RafaelP, May 05 '22 at 03:20
Don't use images, it was just missing a newline between the text and table — mozway, May 05 '22 at 03:21
@mozway My apologies! I am still new to asking questions on stack overflow. Thank you for the help! — RafaelP, May 05 '22 at 03:22
Where do the numbers in the second dataframe come from? How do you get, 594, 694, and 5660? All I see up top is 630. — Tim Roberts, May 05 '22 at 03:26
@TimRoberts The numbers in the second dataframe would be the Specific line values. For example, if the Type of Transaction is Exit and the Product is Shoes, and the quantity associated to that line is 600, it would be recorded under Shoes, Exit, 600 — RafaelP, May 05 '22 at 03:29
Does this answer your question? [How can I pivot a dataframe?](https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe) — Ynjxsjmh, May 05 '22 at 03:35
@Ynjxsjmh, I don't think it does. These dataframes are being pivoted by rows, in general. I am trying to essentially merge Product Name and Transaction Type, and break these two columns into its groupings. Perhaps merging both of these columns is a start... — RafaelP, May 05 '22 at 04:01

score 1 · Accepted Answer · answered May 05 '22 at 04:44

This is essentially a pivot, with flattening of the MultiIndex:

(df
 .pivot(index=['Date', 'ID', 'County Name', 'State', 'State Name'],
        columns=['Product Name', 'Type of Transaction'],
        values='QTY')
 .pipe(lambda d: d.set_axis(map(','.join, d. columns), axis=1))
 .reset_index()
 )

Output:

     Date     ID  County Name State  State Name  Shoes,Entry  Keyboard,Exit  \
0  202001  11684      Chicago    IL    Illionis          NaN            NaN   
1  202012  10002      Houston    TX       Texas          NaN         5493.0   
2  202105  10001  Los Angeles    CA  California        630.0            NaN   
3  202107  12005     New York    NY    New York          NaN            NaN   

   Phone,Disposal  Phone,Entry  
0           220.0          NaN  
1             NaN          NaN  
2             NaN          NaN  
3             NaN        302.0

I have tried inputting this code into my dataset, and received the following error: ValueError: Index contains duplicate entries, cannot reshape. Also tried adding aggfunc='first', in order to aggregate indexes but it didn't work! — RafaelP, May 05 '22 at 04:52

How to transform combinations of values in columns into individual columns?

1 Answers1