python pandas parse_dates column wildcard for sqlalchemy in pandas 0.14.1?

Question

I'm using the sqlalchemy that allows sql queries for the recently released 0.14.1 version of pandas.

import pandas as pd
from dateutil import parser
from sqlalchemy import create_engine
import datetime

a=[['Datetime', 'Now Date', 'numbers', 'mixed'], ['1/2/2014', datetime.datetime.now(),6, 'z1'], ['1/3/2014', datetime.datetime.now(), 3, 'z1']]
df = pd.DataFrame(a[1:],columns=a[0])
df['Datetime']=df['Datetime'].map(lambda x: parser.parse(x))

engine=create_engine('sqlite:///:memory:')
df.to_sql('db_table',engine, index=False)
df_new=pd.read_sql_query("SELECT * FROM db_table ",engine)

>>> df.dtypes
Datetime    datetime64[ns]
Now Date    datetime64[ns]
numbers              int64
mixed               object
dtype: object

>>> df_new.dtypes
Datetime    object
Now Date    object
numbers      int64
mixed       object
dtype: object

As you can see, my original datetime format is lost when feeding it into the engine. But pandas gives you a way to get it back by parsing.

df_new=pd.read_sql_query("SELECT * FROM db_table ",engine, parse_dates=['Datetime','Now Date'])

>>> df_new.dtypes
Datetime    datetime64[ns]
Now Date    datetime64[ns]
numbers              int64
mixed               object
dtype: object

The problem is i'm feeding different kinds of datetimes into the engine with different column names, I can't manually specify each column name. I have too many things to parse and it is constantly changing. I'm looking for a solution that is the equivalent of something like this:

df_new=pd.read_sql_query("SELECT * FROM db_table ",engine, parse_dates=['*Date*'])

score 1 · Accepted Answer · edited May 23 '17 at 12:07

SQLite has no date or datetime type. So the datetime values are stored as strings and when fetching a query they also come back as strings.
But there are some different options here to deal with this:

Use read_sql_table instead of read_sql_query (if you only need to do "SELECT * FROM ..." or certain columns, and you need no where clause). This will use the information in the table schema and detect it are datetime columns and convert them (sqlalchemy does this):
```
In [13]: df_new2 = pd.read_sql_table("db_table",engine)

In [15]: df_new2.dtypes
Out[15]: 
Datetime    datetime64[ns]
Now Date    datetime64[ns]
numbers              int64
mixed               object
dtype: object
```

You can specify sqlite3.PARSE_DECLTYPES (see docs or this question: How to read datetime back from sqlite as a datetime instead of string in Python?) when using a sqlite connection:

In [33]: con = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES) 
In [34]: df.to_sql('db_table', con, index=False)

In [35]: df_new = pd.read_sql_query("SELECT * FROM db_table",con)

In [36]: df_new.dtypes 
Out[36]: 
Datetime    datetime64[ns]
Now Date    datetime64[ns]
numbers              int64
mixed               object
dtype: object

This does not seem to work very nice with sqlalchemy (http://docs.sqlalchemy.org/en/rel_0_9/dialects/sqlite.html#compatibility-with-sqlite3-native-date-and-datetime-types)

You can do the parsing afterwards, to do this automatically on all columns that contain 'Date':

In [45]: date_cols = [col for col in df.columns if 'Date' in col]

In [47]: for col in date_cols:
   ....:     df[col] = pd.to_datetime(df[col])
   ....:

python pandas parse_dates column wildcard for sqlalchemy in pandas 0.14.1?

1 Answers1