1

I am reading data from SQL Server to S3 as a parquet file. In SQL Server, my data type is date and the format is 2022-09-01 like a date should be.

When I read the parquet file using pandas with the code below:

df=pd.read_parquet(r"path\to\file.parquet", engine='fastparquet')

pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)

print(df)

It automatically converts the date datatype from the source, to datetime64[ns] in the target parquet file. I don’t know why it does this. The format of the column looks the same as the source, 2022-09-01 but the data type is datetime.

For other columns the source data type was datetime and it converted to datetime, for this one it was date and converted to datetime.

How can I stop this?

I don’t know what to tell the team that does quality assurance checks, they keep bugging me asking me why. I don’t know because that’s just how parquet reader does it?

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • Is there such thing as “date” datatype in python or just datetime and object? – Stack_mobile2 Apr 04 '23 at 20:31
  • 2
    If the SQL Server type is `date`, the format is **NOT** `yyyy-MM-dd`. It's **binary** and not human readable. Anything you see different is a convenience provided by the tooling. – Joel Coehoorn Apr 04 '23 at 21:07

1 Answers1

2

You're conflating how dates are stored with how dates are rendered. They're two separate considerations.

Dates, or really any data, isn't stored in a format. It's stored in a manner that maximizes efficiency for the SQL engine. Have a look at When storing a datetime in sql server (datetime type), what format does it store it in? for a more in-depth discussion.

For your purposes, though, a date is stored as a single integer and a datetime is stored as a grouping of two integers, and the software rendering your query results, SSMS or Toad or whatever, uses the datatype designation to determine how to render that piece of information when you extract it from the table.

The same thing is happening when your export file is being read. The intrinsic data conversion is storing dates one way and datetimes another way. They apparently render the same way when you query them, but that's up to the UI. Formatting is not a part of the data; it's a decision the rendering engine makes.

One solution might be to explicitly cast the date values to datetimes, or whatever other format you want them in, on the extraction to the parquet file.

Eric Brandt
  • 7,886
  • 3
  • 18
  • 35