0

I have a dataframe that contains sales data including the sold item and the date of the sale. There is no quantity to sum, every sale is one of the item. Here is a same of the data:

date_of_sale    item
2020/01/01  apple
2020/01/02  peach
2020/01/03  grape
2020/01/01  banana
2020/01/02  apple
2020/01/03  peach
2020/01/01  grape
2020/01/02  banana
2020/01/03  apple
2020/01/01  peach
2020/01/02  grape
2020/01/01  banana
2020/01/02  apple
2020/01/03  peach
2020/01/01  grape
2020/01/02  banana
2020/01/03  apple

I would like to create a new dataframe from this existing dataframe that simply sums the number of sales for each item and groups them by day. In SQL, this is easy:

select
date_of_sale
, sum(if (item = "orange", 1, 0)) 'is_orange'
, sum(if (item = "apple", 1, 0)) 'is_apple'
, sum(if (item = "pear", 1, 0)) 'is_pear'
, sum(if (item = "grape", 1, 0)) 'is_grape'
, sum(if (item = "cherry", 1, 0)) 'is_cherry'
, sum(if (item = "banana", 1, 0)) 'is_banana'
, sum(if (item = "peach", 1, 0)) 'is_peach'
FROM 
    sales
group by 1
order by 1 asc

This would result in a nice clean table of sales that I could then easily graph. The resulting dataframe should look like this:

date    is_apple    is_banana   is_grape    is_peach
2020/01/01  1   2   2   1
2020/01/02  2   2   1   1
2020/01/03  2   1   2   0

What is the best way to do this with Pandas? I've tried some ugly ways that are not very pythonic like making a bunch of new columns manually with a 0 or 1 depending on T/F. This is a very long code block as there are too many items in the real life dataset.

Any help would be greatly appreciated!

mikernova
  • 55
  • 5
  • 1
    You've provided a sql query, not a sample of your data, so it's not entirely clear what you're asking. In the absence of a [mcve], it look slike you want to use [pandas groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) – G. Anderson Aug 13 '20 at 22:54
  • Thanks for adding a minimal reproducible example. Now, i'ts not too clear what you want to do, please add the desired output and what you have tried. – MrNobody33 Aug 13 '20 at 23:15
  • Wow, thanks so much @MrNobody33, that was all I needed. `df.groupby(['sale_date','item']).size().unstack(fill_value=0).plot(subplots=True, figsize=(30, 10))` and that plotted the spark lines I was looking for lickey split! – mikernova Aug 13 '20 at 23:41
  • You're welcome @mikernova ! Thanks for adding the things we asked for, that made your question a lot cleaner! Glad you found an answer!:) – MrNobody33 Aug 14 '20 at 00:03

0 Answers0