ValueError: Cannot cast DatetimeIndex to dtype datetime64[us]

Question

I'm trying to create a PostgreSQL table of 30-minute data for the S&P 500 ETF (spy30new, for testing freshly inserted data) from a table of several stocks with 15-minute data (all15). all15 has an index on 'dt' (timestamp) and 'instr' (stock symbol). I would like spy30new to have an index on 'dt'.

import numpy as np
import pandas as pd
from datetime import datetime, date, time, timedelta
from dateutil import parser
from sqlalchemy import create_engine

# Query all15
engine = create_engine('postgresql://user:passwd@localhost:5432/stocks')
new15Df = (pd.read_sql_query("SELECT dt, o, h, l, c, v FROM all15 WHERE (instr = 'SPY') AND (date(dt) BETWEEN '2016-06-27' AND '2016-07-15');", engine)).sort_values('dt')
# Correct for Time Zone.
new15Df['dt'] = (new15Df['dt'].copy()).apply(lambda d: d + timedelta(hours=-4))

# spy0030Df contains the 15-minute data at 00 & 30 minute time points
# spy1545Df contains the 15-minute data at 15 & 45 minute time points
spy0030Df = (new15Df[new15Df['dt'].apply(lambda d: d.minute % 30) == 0]).reset_index(drop=True)
spy1545Df = (new15Df[new15Df['dt'].apply(lambda d: d.minute % 30) == 15]).reset_index(drop=True)

high = pd.concat([spy1545Df['h'], spy0030Df['h']], axis=1).max(axis=1)
low = pd.concat([spy1545Df['l'], spy0030Df['l']], axis=1).min(axis=1)
volume = spy1545Df['v'] + spy0030Df['v']

# spy30Df assembled and pushed to PostgreSQL as table spy30new
spy30Df = pd.concat([spy0030Df['dt'], spy1545Df['o'], high, low, spy0030Df['c'], volume], ignore_index = True, axis=1)
spy30Df.columns = ['d', 'o', 'h', 'l', 'c', 'v']
spy30Df.set_index(['dt'], inplace=True)
spy30Df.to_sql('spy30new', engine, if_exists='append', index_label='dt')

This gives the error "ValueError: Cannot cast DatetimeIndex to dtype datetime64[us]"
What I've tried so far (I have successfully pushed CSV files to PG using pandas. But here the source is a PG database):

Not placing an index on 'dt'

spy30Df.set_index(['dt'], inplace=True)  # Remove this line
spy30Df.to_sql('spy30new', engine, if_exists='append')  # Delete the index_label option

Converting 'dt' from type pandas.tslib.Timestamp to datetime.datetime using to_pydatetime() (in case psycopg2 can work with python dt, but not pandas Timestamp)

u = (spy0030Df['dt']).tolist()
timesAsPyDt = np.asarray(map((lambda d: d.to_pydatetime()), u))
spy30Df = pd.concat([spy1545Df['o'], high, low, spy0030Df['c'], volume], ignore_index = True, axis=1)
newArray = np.c_[timesAsPyDt, spy30Df.values]
colNames = ['dt', 'o', 'h', 'l', 'c', 'v']
newDf = pd.DataFrame(newArray, columns=colNames)
newDf.set_index(['dt'], inplace=True)
newDf.to_sql('spy30new', engine, if_exists='append', index_label='dt')

Using datetime.utcfromtimestamp()

timesAsDt = (spy0030Df['dt']).apply(lambda d: datetime.utcfromtimestamp(d.tolist()/1e9))

Using pd.to_datetime()

timesAsDt = pd.to_datetime(spy0030Df['dt'])

Rock Pereira · Accepted Answer · 2018-04-08T08:12:11.727

8

Using pd.to_datetime() on each element worked. Option 4, which doesn't work, applies pd.to_datetime() to the entire series. Perhaps the Postgres driver understands python datetime, but not datetime64 in pandas & numpy. Option 4 produced the correct output, but I got ValueError (see title) when sending the DF to Postgres

timesAsPyDt = (spy0030Df['dt']).apply(lambda d: pd.to_datetime(str(d)))

edited Apr 08 '18 at 08:12

answered Jul 22 '16 at 15:36

Rock Pereira

471
1
4
12

Wilhelm · Answer 2 · 2018-04-11T12:19:50.057

I had the same problem and applying pd.to_datetime() on each element worked as well. But it is orders of magnitude slower than running pd.to_datetime() on the entire series. For a dataframe with over a 1 million rows:

(df['Time']).apply(lambda d: pd.to_datetime(str(d)))

takes approximately 70 seconds

and

pd.to_datetime(df['Time'])

takes approximately 0.01 seconds

The actual problem is that timezone information is being included. To remove it:

t = pd.to_datetime(df['Time'])
t = t.tz_localize(None)

This should be much faster!

Praveenkumar Beedanal · Answer 3 · 2018-08-15T13:34:24.127

Actually, this was my data frame.

                              Biomass  Fossil Brown coal/Lignite  Fossil Coal-derived gas  Fossil Gas  Fossil Hard coal  Fossil Oil  Geothermal  Hydro Pumped Storage  Hydro Run-of-river and poundage  Hydro Water Reservoir  Nuclear   Other  Other renewable    Solar  Waste  Wind Offshore  Wind Onshore
2018-02-02 00:00:00+01:00   4835.0                    16275.0                    446.0      1013.0            4071.0       155.0         5.0                   7.0                           1906.0                   35.0   8924.0  3643.0            142.0      0.0  595.0         2517.0       19999.0
2018-02-02 00:15:00+01:00   4834.0                    16272.0                    446.0      1010.0            3983.0       155.0         5.0                   7.0                           1908.0                   71.0   8996.0  3878.0            142.0      0.0  594.0         2364.0       19854.0
2018-02-02 00:30:00+01:00   4828.0                    16393.0                    446.0      1019.0            4015.0       155.0         5.0

I was trying to insert into SQL database but getting the same error as in the above question. What i have done is, convert the index of the data frame to the column with a label 'index'.

df.reset_index(level=0, inplace=True)

Rename the column name 'index' to 'DateTime' by using this code.

df = df.rename(columns={'index': 'DateTime'})

Change the datatype to the 'datetime64'.

df['DateTime'] = df['DateTime'].astype('datetime64')

Store it in the sql database using these code.

engine = create_engine('mysql+mysqlconnector://root:Password@localhost/generation_data', echo=True)
df.to_sql(con=engine, name='test', if_exists='replace')

ValueError: Cannot cast DatetimeIndex to dtype datetime64[us]

3 Answers3

Linked