My objective in the following dataframe is to count how many cars were sold in a specific location for a given month. You will note that the pivot result are correct BUT the columns (dates) are inconsistent. The reason why it's inconsistent is because I'm aggregating the month as a string (if a car was sold on 9-October and 10-October then I count 2 for October).
Question: how should I design the applied get_date in such was that the pivot table would sort the dates correctly? Or, is there a better way to approach this problem to get the desired result.
Table
pivot table result:
The dataframe:
df = pd.DataFrame({"LOCATION":["USA","USA","USA","USA","JAPAN","JAPAN"],"CAR":["BMW","BMW","TOYOTA","FORD","BMW","FORD"],"SALE_DATE":pd.to_datetime(["2017-10-09","2017-09-28","2017-10-09","2017-10-12","2017-09-30","2017-09-30"])})
the apply date_str_date (to aggregate by month):
def get_date_str(row):
doi = row["SALE_DATE"]
doi_str = "".join(doi.strftime('%B-%y'))
return doi_str
the pivot:
df.pivot_table(values="CAR", index="LOCATION", columns='SALE_DATE', aggfunc='count')