0

I have my code here:

df = pd.read_parquet(r"C:\path\to\parquet.parquet", engine='fastparquet')
print(df)

And my source data looks like this:

date
----
2022-02-10
2022-05-03
2164-09-09

My target data in the parquet file looks like this:

date
----
2022-02-10 00:00:00.00000000
2022-05-03 00:00:00.00000000
2164-09-09 00:52:00.03019401

I am migrating data from SQL Server to AWS S3 via DMS and it is being stored as parquet files. Why is Pandas automatically converting the date to datetime64[ns]? I know there is no date datatype in Python, it automatically converts to datetime64[ns] in Python. But why isn't the YYYY-MM-DD format being retained? For some columns it is, and for others it shows the time with the date? I'm confused. Is there any documentation on this?

  • Dates and datetimes in databases are not stored in a user display format - they are stored in a binary format. Hence when you view them, you just get the default display format for the datatype and the UI you are using to view them. If the format **IS** being retained then you probably have the data stored as a string rather than a date/dateime. – Dale K Apr 06 '23 at 21:52
  • hm ok, but for the columns where thats not happening, the datatype is "datetime" not string when i checked using pandas. – Stack_mobile2 Apr 06 '23 at 21:54
  • i am using Visual studio code to view the data – Stack_mobile2 Apr 06 '23 at 22:04
  • I think you question is misleading, the format "YYYY-MM-DD" is being maintained for the date part, and you understand why its turning into a datetime, I guess your question is just why a time component is being added in some cases? – Dale K Apr 06 '23 at 22:08
  • yes! that's my question. why is there time being added in some cases? – Stack_mobile2 Apr 06 '23 at 22:09

0 Answers0