2

My objective in the following dataframe is to count how many cars were sold in a specific location for a given month. You will note that the pivot result are correct BUT the columns (dates) are inconsistent. The reason why it's inconsistent is because I'm aggregating the month as a string (if a car was sold on 9-October and 10-October then I count 2 for October).

Question: how should I design the applied get_date in such was that the pivot table would sort the dates correctly? Or, is there a better way to approach this problem to get the desired result.

Table

enter image description here

pivot table result:

enter image description here

The dataframe:

df = pd.DataFrame({"LOCATION":["USA","USA","USA","USA","JAPAN","JAPAN"],"CAR":["BMW","BMW","TOYOTA","FORD","BMW","FORD"],"SALE_DATE":pd.to_datetime(["2017-10-09","2017-09-28","2017-10-09","2017-10-12","2017-09-30","2017-09-30"])})

the apply date_str_date (to aggregate by month):

def get_date_str(row):
    doi = row["SALE_DATE"]
    doi_str = "".join(doi.strftime('%B-%y'))
    return doi_str

the pivot:

df.pivot_table(values="CAR", index="LOCATION", columns='SALE_DATE', aggfunc='count')
adhg
  • 10,437
  • 12
  • 58
  • 94
  • Try `df.SALE_DATE = pd.to_datetime(df.SALE_DATE).dt.month`, and then pivot from there. – Ami Tavory May 02 '18 at 19:37
  • See also [this question](https://stackoverflow.com/questions/37625334/python-pandas-convert-month-int-to-month-name) if you want the month names in strings. – Ami Tavory May 02 '18 at 19:38
  • month as string is an 'object' so it will have an alphabetical order. I'll get the same result. Another way is to apply a method that adjust all months to the beginning of the month (i.e. 4-Oct-17 becomes 1-Oct-17) and then pivot around columbs="first_of_month". df.SALE_DATE = pd.to_datetime(df.SALE_DATE).dt.month gives a number (9 for Sep. I need the year as well). Thanks – adhg May 02 '18 at 20:09
  • So first pivot according to the months as numbers, sort, then transform to strings, no? – Ami Tavory May 02 '18 at 20:15
  • Ok. (as I'm quite new to python) the columns would be a string, no? so I'll have to define one by one wth df.rename(columns={0:["SEP... – adhg May 02 '18 at 20:22
  • thanks for all the pointers. Very valuable! – adhg May 02 '18 at 21:11

1 Answers1

1

You can do the following:

First, extract the month from the date:

df.SALE_DATE = pd.to_datetime(df.SALE_DATE).dt.month

Now pivot:

df = df.pivot_table(values="CAR", index="LOCATION", columns='SALE_DATE', aggfunc='count').fillna(0)

Sort by the dates: df = df.reindex_axis(sorted(df.columns), axis=1)

Transform them into strings:

months = {1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'May',
            6: 'Jun', 7: 'Jul', 8: 'Aug', 9: 'Sep', 10: 'Oct', 11: 'Nov', 12: 'Dec'}
df.columns = [months[c] for c in df.columns]
>>> df
                 Sep    Oct
LOCATION        
JAPAN            2.0    0.0
USA              1.0    3.0
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185