-1

[SOLVED by Changing the date format for pandas to parse]

I am trying to calculate Month Name and I checked there is a massive difference when I am using dt.month_name and Excel's month to get the month of transaction.

Context: I am running an analysis on my own bank transaction. For the simplicity, I have two dates - Value date and Transaction date. I have converted both to datetime format and used dt.month to get the month names. After facing some problems and doing a RCA, I came to the conclusion that the value_counts of months calculated in excel is vastly different from the ones calculated in Python.

enter image description here

This is after using Transaction date to calculate the month value (py_Month)

enter image description here

This is after using Value Date to calculate the month (py_month)

The Entire code is as follows:

data = pd.read_excel("data.xlsx",sheet_name="Sheet1")
data = data.drop(["S No."],axis=1)
    
# Convert to datetime.
date_cols = ["Value Date","Transaction Date"]
data[date_cols] = data[date_cols].apply(pd.to_datetime)   

val_date  = "Value Date"
tnx_date = "Transaction Date"

data["py_Month"] = data[val_date].dt.month_name()
# data["py_Month"] = data[tnx_date].dt.month_name()
   

vc_df = data[['xl_month', 'py_Month']].apply(pd.Series.value_counts)
vc_df
  • 1
    please provide a [reproducible](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) dataset as input, it's otherwise impossible to reproduce your issue – mozway Jul 04 '23 at 12:33
  • Please, provide a [reproducible pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – alec_djinn Jul 04 '23 at 12:52
  • I do have that ready and on drive. But Google sheets is acting differently from my Excel on desktop. Added the link in the post, though – Jayit Ghosh Jul 04 '23 at 12:52
  • you have date string constants in your xl_month column, you should fix them first – Stef Jul 04 '23 at 13:28

1 Answers1

0

Solved it.

The problem was with the parsing of the date formats in Excel and Python.

Explicitly mentioning the format resolved the problem:

data[date_cols] = data[date_cols].apply(pd.to_datetime, format='%d/%m/%Y')