I am reading data from SQL Server to S3 as a parquet file. In SQL Server, my data type is date
and the format is 2022-09-01
like a date should be.
When I read the parquet file using pandas with the code below:
df=pd.read_parquet(r"path\to\file.parquet", engine='fastparquet')
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)
print(df)
It automatically converts the date datatype from the source, to datetime64[ns]
in the target parquet file. I don’t know why it does this. The format of the column looks the same as the source, 2022-09-01
but the data type is datetime
.
For other columns the source data type was datetime
and it converted to datetime
, for this one it was date
and converted to datetime
.
How can I stop this?
I don’t know what to tell the team that does quality assurance checks, they keep bugging me asking me why. I don’t know because that’s just how parquet reader does it?