0

How to modify the data set such that it shows per month for each country how many purchases were made?

enter image description here

This is part of the dataset I have. There are hundreds of orders per date per country. I would like to modify the data set such that it shows per month for each country how many purchases were made.

I am beginner in data science, any help will be highly appreciated.

So far I have sorted the countries which I would like to use in my report.

 datasetCountry = dataset[dataset['Country'].isin(["France","Belgium","EIRE","Germany","Netherlands","Norway","Portugal","Spain","Switzerland"])].copy()
 datasetCountry.head()

But I do not have an idea how to count the orders per month for each of them

Dakata
  • 1,227
  • 2
  • 14
  • 33
  • Please provide a copyable [mcve]. – cs95 Jan 10 '19 at 18:14
  • So you want to *group* your data by date and country and count the entries? That's a [Group-by problem](https://pandas.pydata.org/pandas-docs/version/0.22/groupby.html). – Martijn Pieters Jan 10 '19 at 18:17
  • Change the format of the invoice date to Y-M, then `datasetCountry.groupby(['InvoiceDate', 'Country'])['Quantity'].sum()` should work, note `count()` and `sum()` will return different values. `count()` will return the number of times quantity is seen for a country and `sum()` will add the values in this column. – d_kennetz Jan 10 '19 at 18:17
  • `how many purchases were made` is ambiguous. Does that mean how many orders/invoices or sum of quantity? – jpp Jan 10 '19 at 18:21
  • @jpp -> i.e. how many invoices were created – Dakata Jan 10 '19 at 18:22
  • @Dakata, Then you need `groupby` with `nunique` on `InvoiceNo`. – jpp Jan 10 '19 at 18:23
  • But how can I group them per month per country, because many of the orders were made in the same day? – Dakata Jan 10 '19 at 18:27
  • df = pd.DataFrame({'Date': pd.to_datetime(['2012-02-08', '2012-02-20']), 'Orders': [5, 10]}) # normalize day to beginning of month df['NewMonth'] = df['Date'] + pd.offsets.MonthBegin(1) #update here df['NewMonth'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day-1, unit='D') #group by month new = df.groupby('NewMonth') #output summary = new['Orders'].sum() – Maria Nazari Jan 10 '19 at 18:31
  • @Maria Nazari example? – Dakata Jan 10 '19 at 18:33
  • @Dakata i provided an example in my comment, since your question is labeled as duplicated I can't post an answer. Paste by comment and separate by comments – Maria Nazari Jan 10 '19 at 18:34
  • useful link https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/ – Maria Nazari Jan 10 '19 at 18:39
  • the last line is this -> new['Orders'].sum() <- ? – Dakata Jan 10 '19 at 18:40
  • summary = new['Orders'].sum() – Maria Nazari Jan 10 '19 at 18:55

0 Answers0