I have a dataframe that contains sales data including the sold item and the date of the sale. There is no quantity to sum, every sale is one of the item. Here is a same of the data:
date_of_sale item
2020/01/01 apple
2020/01/02 peach
2020/01/03 grape
2020/01/01 banana
2020/01/02 apple
2020/01/03 peach
2020/01/01 grape
2020/01/02 banana
2020/01/03 apple
2020/01/01 peach
2020/01/02 grape
2020/01/01 banana
2020/01/02 apple
2020/01/03 peach
2020/01/01 grape
2020/01/02 banana
2020/01/03 apple
I would like to create a new dataframe from this existing dataframe that simply sums the number of sales for each item and groups them by day. In SQL, this is easy:
select
date_of_sale
, sum(if (item = "orange", 1, 0)) 'is_orange'
, sum(if (item = "apple", 1, 0)) 'is_apple'
, sum(if (item = "pear", 1, 0)) 'is_pear'
, sum(if (item = "grape", 1, 0)) 'is_grape'
, sum(if (item = "cherry", 1, 0)) 'is_cherry'
, sum(if (item = "banana", 1, 0)) 'is_banana'
, sum(if (item = "peach", 1, 0)) 'is_peach'
FROM
sales
group by 1
order by 1 asc
This would result in a nice clean table of sales that I could then easily graph. The resulting dataframe should look like this:
date is_apple is_banana is_grape is_peach
2020/01/01 1 2 2 1
2020/01/02 2 2 1 1
2020/01/03 2 1 2 0
What is the best way to do this with Pandas? I've tried some ugly ways that are not very pythonic like making a bunch of new columns manually with a 0 or 1 depending on T/F. This is a very long code block as there are too many items in the real life dataset.
Any help would be greatly appreciated!