1

I am trying to use pandas to select some data from an Oracle database. The column in question has the data type TIMESTAMP(6) WITH TIME ZONE. I am in the same time zone as the database, but it contains data that is recorded from a different time zone.

Oracle version: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
Python 3.8.13
SQLAlchemy 1.4.39
cx_Oracle 8.3.0

In PL/SQL Developer, the query works:

SELECT col
FROM table

Returns

18-JAN-21 09.54.58.000000000 PM ASIA/BANGKOK

In Python, I get this error:

import sqlalchemy
import cx_Oracle

server = server
port = port
sid = sid
username = username
password = password
dsn_tns = cx_Oracle.makedsn(server, port, sid)
cnxn = cx_oracle.connect(username, password, dsn_tns)

query = """
    SELECT col
    FROM table
"""
df = pd.read_sql_query(query, cnxn)

Output:

DatabaseError: ORA-01805: possible error in date/time operation

After some SO searching, I tried this:

query = """
    SELECT CAST(TO_TIMESTAMP_TZ(
        col, 
        'DD-MMM-YY HH.MI.SS.FF6 TZH TZR')
) AT TIME ZONE 'ASIA/BANGKOK' AS col
    FROM table
"""

df = pd.read_sql_query(query, cnxn_tds_dev)

Which returns a different error message:

ORA-00905: missing keyword

How can I just select this timestamp column (and several others) using Python/SQLAlchemy/cx_Oracle? Because the query works in PL/SQL Developer, I am assuming it is an issue with cx_Oracle. I will try creating a new Python environment with an older version of cx_Oracle, per this post.

Evan
  • 2,121
  • 14
  • 27
  • Verified that downgrading to cx_Oracle 6.4.1 and Python 3.7 results in a working query. Leaving this question open in order to find a better resolution to the issue. – Evan Nov 24 '22 at 00:29
  • Did you change Oracle Client libraries when you downgraded? Did you change Pandas / SQLAlchemy versions? – Christopher Jones Nov 24 '22 at 00:48
  • @ChristopherJones I created a completely new python environment, so I suspect the versions of other libraries also changed. New env is `Python 3.7.15`, `SQLAlchemy 1.4.39`, `cx_Oracle 6.4.1`, and `pandas 1.3.5`. – Evan Nov 24 '22 at 00:52
  • 1
    Can you print out the value of `cx_Oracle.clientversion()` in both cases? – Anthony Tuininga Nov 24 '22 at 01:06
  • @AnthonyTuininga it is `(19, 5, 0, 0, 0)` in both cases. – Evan Nov 24 '22 at 01:07
  • Using `Python 3.11`, it is possible to select the troublesome `TIMESTAMP WITH TIME ZONE` as a string, but I'm not sure how to use `pd.to_datetime()` to convert it back. `pd.to_datetime(df["col"], format="%d-%m-%Y %H.%M.%S.%f %p %Z")` gives an error, presumably because of an incorrect format string. – Evan Nov 24 '22 at 01:22
  • OK, I can at least select the data as a string and convert it to a tz-aware `datetime`. `pd.to_datetime("09-MAR-22 06.04.07.000000 PM ASIA/BANGKOK", format="%d-%b-%y %I.%M.%S.%f %p %Z")` works. I don't know if this is a robust solution; it assigns the dtype and tz to the whole column, rather than to each row. In the future, I expect to have >= 3 timezones present in this column. – Evan Nov 24 '22 at 01:27
  • 1
    How about upgrading SQLAlchemy? I just tried with Python 3.9, pandas 1.5.2, sqlalchemy 1.4.44, cx_Oracle 8.3.0, Instant Client 19.8, Oracle DB 23Cc Beta and got expected output – Christopher Jones Nov 24 '22 at 01:28
  • @ChristopherJones Interesting - I am testing this with the those versions of pandas, sqlalchemy, and cx_Oracle ( as of right now), and I get the original error in the question. It looks like casting to a string and then converting in pandas works, so I'll leave it here for now. It does at least solve the problem of getting data out of the database in my usual Python env. – Evan Nov 24 '22 at 01:35
  • 1
    I'll post my code for your comparison. – Christopher Jones Nov 24 '22 at 01:37
  • 1
    I believe the source of the issue is due to the fact that in cx_Oracle 8.0 the timestamp with time zone type was handled natively (by the Oracle Client libraries) whereas previously it was coerced to a naive timestamp (conversion performed on the server). The problem, though, is that the client has *different* timezone files than the server and so you get this error. If you can get newer timezone files for the server the problem should go away. – Anthony Tuininga Nov 24 '22 at 20:49

2 Answers2

1

For the record, the code I mentioned in the original comment thread is:

# create table t (c TIMESTAMP(6) WITH TIME ZONE);
# insert into t (c) values (systimestamp);
# commit;
#
# Name: pandas
# Version: 1.5.2
# Name: SQLAlchemy
# Version: 1.4.44
# Name: cx-Oracle
# Version: 8.3.0
#
# Output is like:
#  0 2022-11-24 11:49:25.505773

import os
import platform

from sqlalchemy import create_engine
import pandas as pd

import cx_Oracle

if platform.system() == "Darwin":
    cx_Oracle.init_oracle_client(lib_dir=os.environ.get("HOME")+"/Downloads/instantclient_19_8")

username = os.environ.get("PYTHON_USERNAME")
password = os.environ.get("PYTHON_PASSWORD")
connect_string = os.environ.get("PYTHON_CONNECTSTRING")
hostname, service_name = connect_string.split("/")

engine = create_engine(f'oracle://{username}:{password}@{hostname}/?service_name={service_name}')

query = """select * from t"""
df = pd.read_sql_query(query, engine)
print(df)
Christopher Jones
  • 9,449
  • 3
  • 24
  • 48
  • We used slightly different methods to create the connection. I'll compare the two later to see if that removes the error. – Evan Nov 24 '22 at 04:06
  • Confirm that this connection method generates the same error message. – Evan Nov 24 '22 at 14:24
-1

One solution is to cast the problematic columns as strings, then convert in pandas.

query = "SELECT TO_CHAR(col) AS col FROM table"
df = pd.read_sql_query(query, cnxn)
df[col] = df[col].apply(pd.to_datetime, format="%d-%b-%y %I.%M.%S.%f %p %Z")
Evan
  • 2,121
  • 14
  • 27