I have a dataset (df), that looks like this:
Date | ID | County Name | State | State Name | Product Name | Type of Transaction | QTY |
---|---|---|---|---|---|---|---|
202105 | 10001 | Los Angeles | CA | California | Shoes | Entry | 630 |
202012 | 10002 | Houston | TX | Texas | Keyboard | Exit | 5493 |
202001 | 11684 | Chicago | IL | Illionis | Phone | Disposal | 220 |
202107 | 12005 | New York | NY | New York | Phone | Entry | 302 |
... | ... | ... | ... | ... | ... | ... | ... |
202111 | 14990 | Orlando | FL | Florida | Shoes | Exit | 201 |
For every county, there are multiple entries for different Products, types of transactions, and at different dates, but not all counties have the same number of entries and they don't follow the same dates.
I want to recreate this dataset, such that: 1 - All counties have the same start and end dates, and for those dates where the county does not record entries, I want this entry to be recorded as NaN. 2 - The product names and their types are their own columns.
Essentially, this is how the dataset needs to look:
Date | ID | County Name | State | State Name | Shoes, Entry | Shoes, Exit | Shoes, Disposal | Phones, Entry | Phones, Exit | Phones, Disposal | Keyboard, Entry | Keyboard, Exit | Keyboard, Disposal |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
202105 | 10001 | Los Angeles | CA | California | 594 | 694 | 5660 | 33299 | 1110 | 5659 | 4559 | 3223 | 56889 |
202012 | 10002 | Houston | TX | Texas | 3420 | 4439 | 549 | 2110 | 5669 | 2245 | 39294 | 3345 | 556 |
202001 | 11684 | Chicago | IL | Illionis | 55432 | 4439 | 329 | 21190 | 4320 | 455 | 34059 | 44556 | 5677 |
202107 | 12005 | New York | NY | New York | 34556 | 2204 | 4329 | 11193 | 22345 | 43221 | 1544 | 3467 | 22450 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
202111 | 14990 | Orlando | FL | Florida | 54543 | 23059 | 3290 | 21394 | 34335 | 59660 | NaN | NaN | NaN |
Under the example, you can see how Florida does not record certain transactions. I would like to add the NaN such that the dataframe looks like this. I appreciate all the help!