5

I'm trying this:

import pandas as pd
import sqlite3
import datetime, pytz

#nowtime=datetime.datetime.now(pytz.utc)
nowtime=datetime.datetime.now()

print(nowtime)
df = pd.DataFrame(columns=list('ABCD'))
df.loc[0]=(3,0.141,"five-nine",nowtime)
df.loc[1]=(1,0.41,"four-two",nowtime)

print(df)

db = sqlite3.connect(':memory:')
c = db.cursor()
c.execute('create table if not exists ABCD ( A integer, B real, C text, D timestamp );')
c.execute('insert into ABCD (A,B,C, D) values (?,?,?,?);',(1,2.2,'4',nowtime))
c.executemany('insert into ABCD (A,B,C, D) values (?,?,?,?);',df.to_records(index=False))

db.commit()

print(pd.read_sql('select * from ABCD;',db))

and getting this:

 2018-03-07 19:09:58.584953
   A      B          C                          D
0  3  0.141  five-nine 2018-03-07 19:09:58.584953
1  1  0.410   four-two 2018-03-07 19:09:58.584953
   A      B          C                           D
0  1  2.200          4  2018-03-07 19:09:58.584953
1  3  0.141  five-nine    b'\xa8hx?\t\xb9\x19\x15'
2  1  0.410   four-two    b'\xa8hx?\t\xb9\x19\x15'

Ideally, I'd like to push some data with timestamps into sqlite3 and recover it back into pandas/python/numpy interoperably.

I've seen Appending Pandas dataframe to sqlite table by primary key for appending, but I'm not sure how to work with datetime.datetime, pandas Timestamps or numpy.datetime64 times with sqlite3.

Also, there was How to read datetime back from sqlite as a datetime instead of string in Python? but I couldn't figure out how to do it in pandas.

One thing I spent a lot of time on was https://stackoverflow.com/a/21916253/1653571 and the confusing multiple to_datetime()s.

What's a good way to work with times, sqlite3, and pandas?

####### update:

I tried these changes:

db = sqlite3.connect(':memory:',detect_types=sqlite3.PARSE_DECLTYPES)

#...
for index,row in df.iterrows():
    print(row)
    c.execute('insert into ABCD (A,B,C,D) values (?,?,?,?);',(row.A,row.B,row.C,row.D.to_pydatetime()))


x = pd.read_sql('select *  from ABCD;',db)

print('Type of a pd.read_sql(SQLite3) timestamp  : ',type(x['D'][0]))

x = c.execute('select * from ABCD').fetchall()

print(x)
print('Type of a sqlite.execute(SQLite3) timestamp  : ',type(x[0][3]))

to use the SQLite3 datatypes and to test the returned values:

Type of a pd.read_sql(SQLite3) timestamp  :  <class 'pandas._libs.tslib.Timestamp'>
[(1, 2.2, '4', datetime.datetime(2018, 3, 8, 14, 46, 2, 520333)), (3, 141.0, 'five-nine', datetime.datetime(2018, 3, 8, 14, 46, 2, 520333)), (1, 41.0, 'four-two', datetime.datetime(2018, 3, 8, 14, 46, 2, 520333))]
Type of a sqlite.execute(SQLite3) timestamp  :  <class 'datetime.datetime'>

Also, as I tried datetime.datetime.now(pytz.utc) to get UTC-aware times, but it broke lots of stuff. Using datetime.datetime.utcnow() worked much better by returning a not-timezone-aware object that isn't affected by timezones.

Note also the Python sqlite3 documentation on the sqlite3.connect(detect_types=...) parameter. Enabling detect_types=PARSE_DECLTYPES|PARSE_COLNAMES cues python to run converters on data passed between the systems.

Dave X
  • 4,831
  • 4
  • 31
  • 42

2 Answers2

4

The main issue is that SQLite doesn't have a datetime datatype.

PARSE_DECLTYPES can't help when reading out of SQLite because the declared datatypes of columns in SQLite will never be datetimes.

Since you are in control of the Pandas dataframe, you know the types at the point in time you are saving them back to SQLite.

the read_sql method you are using...

is a convenience wrapper around read_sql_table and read_sql_query (and for backward compatibility) and will delegate to the specific function depending on the provided input (database table name or SQL query).

In your example you've provided a query so it is delegating to the read_sql_query method https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql_query.html#pandas.read_sql_query

This has a parameter parse_dates which can be:

Dict of {column_name: arg dict}, where the arg dict corresponds to the keyword arguments of pandas.to_datetime() Especially useful with databases without native Datetime support, such as SQLite

Since you know ahead of time which columns are datatypes, you can store those as a dict with a structure that matches what this parse_dates expects, and just pass it into the read_sql method.

In other situations where I've saved a pandas df back to a csv or other file, I've used something like this to save the schema to reintroduce when loading csv back to pandas. The read_csv method has a dbtypes parameter that takes exactly the structure below.

def getPandasSchema(df):
    ''' 
    takes a pandas dataframe and returns the dtype dictionary
    useful for applying types when reloading that dataframe from csv etc
    '''
    return dict(zip(df.columns.tolist(),df.dtypes.tolist()))
Davos
  • 5,066
  • 42
  • 66
3

Issue derives from pandas' to_records() which is converting your datetime field into an ISO timestamp with T separator:

print(df.to_records(index=False))
# [(3, 0.141, 'five-nine', '2018-03-07T20:40:39.808427000')
#  (1, 0.41 , 'four-two', '2018-03-07T20:40:39.808427000')]

Consider converting datetime column to string and then run cursor executemany():

df.D = df.D.astype('str')

print(df.to_records(index=False))
# [(3, 0.141, 'five-nine', '2018-03-07 20:40:39.808427')
#  (1, 0.41 , 'four-two', '2018-03-07 20:40:39.808427')]

Altogether:

db = sqlite3.connect(':memory:')
c = db.cursor()
c.execute('create table if not exists ABCD ( A integer, B real, C text, D timestamp );')
c.execute('insert into ABCD (A,B,C, D) values (?,?,?,?);',(1,2.2,'4',nowtime))

df['D'] = df['D'].astype('str')
c.executemany('insert into ABCD (A,B,C, D) values (?,?,?,?);',df.to_records(index=False))

db.commit()
print(pd.read_sql('select * from ABCD;',db))

#    A      B          C                           D
# 0  1  2.200          4  2018-03-07 20:47:15.031130
# 1  3  0.141  five-nine  2018-03-07 20:47:15.031130
# 2  1  0.410   four-two  2018-03-07 20:47:15.031130
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • I'm not sure I like `str` as the datatype. Won't it be slower and more complicated to re-parse back into `pandas.Timestamp` or `datetime.datetime`? – Dave X Mar 08 '18 at 15:00
  • 1
    Did you check the `dtypes` after `read_sql` which should derive from SQLite types? The conversion to string was simply to migrate into SQLite. Otherwise do not use `to_records()`, see into `iterrows`. Try even [`to_sql`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html) method with an SQLAlchemy connection. – Parfait Mar 08 '18 at 15:25
  • I found that the returned datatypes are `str` either way unless I add a `detect_types=sqlite3.PARSE_DECLTYPES` parameter to the `db.connect()`. With that param, `c.execute(...).fetch...` returns `datetime.datetime` and `pd.read_sql` returns `pandas._libs.tslib.Timestamp`. The SQLite types seem completely ignored unless the parameter is set. – Dave X Mar 08 '18 at 15:56
  • Saving it to a file and examining the file with 'DB browser for SQLite' indicates that in all the cases I tried, SQLite saved the dates as strings of characters. – Dave X Mar 08 '18 at 15:59
  • 1
    Being a lightweight, file-level DB, SQLite only has few [data types](https://www.sqlite.org/datatype3.html): `TEXT, NUMERIC, INTEGER, REAL, BLOB.` There is no *timestamp* type and likely is saved into its closest affinity class being text. – Parfait Mar 08 '18 at 17:11
  • --Yes, it is stored internally as text, but it seems one needs some `detect_types` magic to cause parsing back from text into non-string time-aware types. Maybe there's some adapter magic that could be written to choose a numeric or real type for pandas timestamps, per https://docs.python.org/3/library/sqlite3.html#using-adapters-to-store-additional-python-types-in-sqlite-databases but the builtin default adapters look like they are for datetime.* and are stored as strings per https://docs.python.org/3/library/sqlite3.html#default-adapters-and-converters – Dave X Mar 08 '18 at 18:03
  • 1
    @DaveX It's not only stored internally as `TEXT`, SQLite has no concept of it being anything else but `TEXT`. When you see things like `datetime.datetime` returned from `PARSE_DECLTYPES` that is because you are reading from Pandas which does have those types. They will always be stored as strings in SQLite as you've found. There is no solution on the SQLite side, your problem exists when reading SQLite back to Pandas, so the datatype conversion needs to happen on the Pandas side. – Davos Apr 05 '18 at 07:32