0

I have a dataset (df), that looks like this:

Date ID County Name State State Name Product Name Type of Transaction QTY
202105 10001 Los Angeles CA California Shoes Entry 630
202012 10002 Houston TX Texas Keyboard Exit 5493
202001 11684 Chicago IL Illionis Phone Disposal 220
202107 12005 New York NY New York Phone Entry 302
... ... ... ... ... ... ... ...
202111 14990 Orlando FL Florida Shoes Exit 201

For every county, there are multiple entries for different Products, types of transactions, and at different dates, but not all counties have the same number of entries and they don't follow the same dates.

I want to recreate this dataset, such that: 1 - All counties have the same start and end dates, and for those dates where the county does not record entries, I want this entry to be recorded as NaN. 2 - The product names and their types are their own columns.

Essentially, this is how the dataset needs to look:

Date ID County Name State State Name Shoes, Entry Shoes, Exit Shoes, Disposal Phones, Entry Phones, Exit Phones, Disposal Keyboard, Entry Keyboard, Exit Keyboard, Disposal
202105 10001 Los Angeles CA California 594 694 5660 33299 1110 5659 4559 3223 56889
202012 10002 Houston TX Texas 3420 4439 549 2110 5669 2245 39294 3345 556
202001 11684 Chicago IL Illionis 55432 4439 329 21190 4320 455 34059 44556 5677
202107 12005 New York NY New York 34556 2204 4329 11193 22345 43221 1544 3467 22450
... ... ... ... ... ... ... ... ... ... ... ... ... ...
202111 14990 Orlando FL Florida 54543 23059 3290 21394 34335 59660 NaN NaN NaN

Under the example, you can see how Florida does not record certain transactions. I would like to add the NaN such that the dataframe looks like this. I appreciate all the help!

mozway
  • 194,879
  • 13
  • 39
  • 75
RafaelP
  • 83
  • 7
  • Please use correct formatting for tables in Markdown. Please read [this post on how to use tables in posts](https://meta.stackoverflow.com/questions/277716/how-can-i-create-a-table-in-a-post) – akaAbdullahMateen May 05 '22 at 03:15
  • @akaAbdullahMateen I tried using tables formatting but I think the table was too large. I went ahead and updated the tables as images. – RafaelP May 05 '22 at 03:20
  • 1
    Don't use images, it was just missing a newline between the text and table – mozway May 05 '22 at 03:21
  • @mozway My apologies! I am still new to asking questions on stack overflow. Thank you for the help! – RafaelP May 05 '22 at 03:22
  • Where do the numbers in the second dataframe come from? How do you get, 594, 694, and 5660? All I see up top is 630. – Tim Roberts May 05 '22 at 03:26
  • @TimRoberts The numbers in the second dataframe would be the Specific line values. For example, if the Type of Transaction is Exit and the Product is Shoes, and the quantity associated to that line is 600, it would be recorded under Shoes, Exit, 600 – RafaelP May 05 '22 at 03:29
  • Does this answer your question? [How can I pivot a dataframe?](https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe) – Ynjxsjmh May 05 '22 at 03:35
  • @Ynjxsjmh, I don't think it does. These dataframes are being pivoted by rows, in general. I am trying to essentially merge Product Name and Transaction Type, and break these two columns into its groupings. Perhaps merging both of these columns is a start... – RafaelP May 05 '22 at 04:01

1 Answers1

1

This is essentially a pivot, with flattening of the MultiIndex:

(df
 .pivot(index=['Date', 'ID', 'County Name', 'State', 'State Name'],
        columns=['Product Name', 'Type of Transaction'],
        values='QTY')
 .pipe(lambda d: d.set_axis(map(','.join, d. columns), axis=1))
 .reset_index()
 )

Output:

     Date     ID  County Name State  State Name  Shoes,Entry  Keyboard,Exit  \
0  202001  11684      Chicago    IL    Illionis          NaN            NaN   
1  202012  10002      Houston    TX       Texas          NaN         5493.0   
2  202105  10001  Los Angeles    CA  California        630.0            NaN   
3  202107  12005     New York    NY    New York          NaN            NaN   

   Phone,Disposal  Phone,Entry  
0           220.0          NaN  
1             NaN          NaN  
2             NaN          NaN  
3             NaN        302.0  
mozway
  • 194,879
  • 13
  • 39
  • 75
  • I have tried inputting this code into my dataset, and received the following error: ValueError: Index contains duplicate entries, cannot reshape. Also tried adding aggfunc='first', in order to aggregate indexes but it didn't work! – RafaelP May 05 '22 at 04:52
  • Then you need to use `pivot_table` – mozway May 05 '22 at 04:53