I have the following data set containing the quantity of phytosanitary products purchased per zip code in france between 2015 and 2019 with its classification (other,toxic,mineral,organic).
the dataframe looks like this, so with the zip_code, the year and the classification you can see the quantity that was purchased
zip_code | year | classification | total_quantity |
---|---|---|---|
01000 | 2015 | other | 44.305436 |
01000 | 2015 | toxic | 212.783330 |
01000 | 2015 | mineral | value |
01000 | 2015 | organic | value |
01000 | 2016 | other | value |
01000 | 2016 | toxic | value |
01000 | 2016 | mineral | value |
it follows the same pattern .....
zip_code | year | classification | total_quantity |
---|---|---|---|
01000 | 2019 | organic | value |
01090 | 2015 | other | value |
but I would like something where you have only one entry per zip code like this (of course going to 2019 and not stoping at 2016 like i did in my exemple)
zip_code | other_total-quantity-2015 | Toxic_total-quantity-2015 | Mineral_total-quantity-2015 | organic_total-quantity-2015 | other_total-quantity-2016 | Toxic_total-quantity-2016 |
---|---|---|---|---|---|---|
01000 | value | value | value | value | value | value |
01090 | value | value | value | value | value |
I tried to do this using the reshape function but the closest i got from what i want is a table where the zip_code is repeated 4 times (for every classification).
Thank you