1

When using read_sql in pandas the function requires to identify what columns should be treated as dates (see snippet below). But I don't know beforehand what columns are dates.

Given a select statement, I need to retrieve its column types and load the result set into a pandas dataframe. I was expecting pandas to identify the columns of type date and assign the type when the dataframe is created from the select.

What is the best way to identify the date types in the select statement or the dataframe? I'm trying to avoid running the statement with pyodbc to detect the types.

import pyodbc
import pandas as pd 

conn = pyodbc.connect....

sql_df = pd.read_sql(
    "SELECT * FROM my_table",
    conn,
    parse_dates=[
        'created_at',
        'updated_at'
    ]
)
ps0604
  • 1,227
  • 23
  • 133
  • 330
  • See: https://stackoverflow.com/a/39353121/9142735 – Juan Estevez Oct 03 '20 at 01:21
  • Juan, this link doesn't provide the answer – ps0604 Oct 03 '20 at 12:31
  • How is it you don't know the data types beforehand? Databases are planned resources with known schema. Tables should not be dynamically built on the fly and having hundreds to thousands of columns is likely a table design issue. In fact, using `*` may not be efficient as you can be retrieving large open-ended text and CLOB/BLOB fields. See [Why is SELECT * considered harmful?](https://stackoverflow.com/q/3639861/1422451) – Parfait Oct 03 '20 at 14:41

2 Answers2

0

Consult sql_df.dtypes. It will be pretty clear about which columns the RDBMS views as datestamps.

For example, when querying a postgres backend, you may find types like this:

[1024 rows x 2 columns]
created    datetime64[ns]
name               object

You may find it helpful to use a postgres cast from date to timestamp:

SELECT created::TIMESTAMP, ... FROM ...

If you're querying a relation that stores time information as VARCHAR, well, there's going to be a little pain involved. You really want to store time data in a suitable column type.

You may find a sqlserver cast helpful, such as:

SELECT CAST (created AS datetime), ... FROM ...
J_H
  • 17,926
  • 4
  • 24
  • 44
  • I did, the type of a date in the select statement is 'object', the same as a string. Cannot tell the difference – ps0604 Oct 03 '20 at 01:52
  • The database is SQL Server, and the type of the column is DATE. I simply select the column and in `dtypes` the value is `object` – ps0604 Oct 03 '20 at 11:41
0

If you use pyodbc with MS SQL Server and pandas, to identify with df.dtypes type date, you need to define the column in SQL Server as datetime and in ds.dtypes you'll get datetime64[ns].

ps0604
  • 1,227
  • 23
  • 133
  • 330