Can't properly read SQL table in python: varchar columns imported as comma-separated characters / tuples

Question

I'm connecting to a Oracle database using the following code:

jar = ojdbc8.jar path
jvm_path = jvm.dll path
args = '-Djava.class.path=%s' % jar
jpype.startJVM(jvm_path, args)
con = jaydebeapi.connect("oracle.jdbc.driver.OracleDriver", url,[user, password], jar)

The connection works fine, however the data is returned in this odd format.

pd.read_sql("SELECT * FROM table1", con)

yields

+---+-----------------+-----------------+-----------------+
|   | (C,O,L,U,M,N,1) | (C,O,L,U,M,N,2) | (C,O,L,U,M,N,3) |
+---+-----------------+-----------------+-----------------+
| 1 | (t,e,s,t)       | (t,e,s,t,2)     | 1               |
+---+-----------------+-----------------+-----------------+
| 2 | (f,o,o)         | (b,a,r)         | 100             |
+---+-----------------+-----------------+-----------------+

The number and dates are imported correctly, but not the varchar columns. I tried different tables and all of them have this problem.

I haven't seen anything like that anywhere. Hope you can help me.

buddemat · Accepted Answer · 2021-04-06T09:14:57.937

This seems to be a problem when using jaydebeapi with jpype. I can reproduce this when connecting to a Oracle db in the same way that you do (in my case Oracle 11gR2, but since you are using ojdbc8.jar, I guess it also happens with other versions).

There are different ways you can solve this:

Change your connection

Since the error only seems to occur in a specific combination of packages, the most sensible thing to do is to try and avoid these and thus the error altogether.

Alternative 1: Use jaydebeapi without jpype:

As noted, I only observe this when using jaydebeapi with jpype. However, in my case, jpype is not needed at all. I have the .jar file locally and my connection works fine without it:

import jaydebeapi as jdba
import pandas as pd
import os

db_host = 'db.host.com'
db_port = 1521
db_sid = 'YOURSID'

jar=os.getcwd()+'/ojdbc6.jar'

conn = jdba.connect('oracle.jdbc.driver.OracleDriver', 
                'jdbc:oracle:thin:@' + db_host + ':' + str(db_port) + ':' + db_sid, 
                {'user': 'USERNAME', 'password': 'PASSWORD'}, 
                jar
                )

df_jay = pd.read_sql('SELECT * FROM YOURSID.table1', conn)

conn.close()

In my case, this works fine and creates the dataframes normally.

Alternative 2: Use cx_Oracle instead:

The issue also does not occur if I use cx_Oracle to connect to the Oracle db:

import cx_Oracle
import pandas as pd
import os

db_host = 'db.host.com'
db_port = 1521
db_sid = 'YOURSID'

dsn_tns = cx_Oracle.makedsn(db_host, db_port, db_sid)
cx_conn = cx_Oracle.connect('USERNAME', 'PASSWORD', dsn_tns)

df_cxo = pd.read_sql('SELECT * FROM YOURSID.table1', con=cx_conn)

cx_conn.close()

Note: For cx_Oracle to work you have to have the Oracle Instant Client installed and properly set up (see e.g. cx_Oracle documentation for Ubuntu).

Fix dataframe after the fact:

If for some reason, you cannot use the above connection alternatives, you can also transform your dataframe.

Alternative 3: join tuple entries:

You can use ''.join() to convert tuples to strings. You need to do this for the entries and the column names.

# for all entries that are not None, join the tuples
for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].apply(lambda x: ''.join(x) if x is not None else x)

# also rename the column headings in the same way
df.rename(columns=lambda x: ''.join(x) if x is not None else x, inplace=True)

Alternative 4: change dtype of columns:

By changnig the dtype of an affected column from object to string, all entries will also be converted. Note that this may have unwanted side-effects, like e.g. changing None values to the string <N/A>. Also, you will have to rename the column headings separately, as above.
```
for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].astype('string')

# again, rename headings
df.rename(columns=lambda x: ''.join(x) if x is not None else x, inplace=True)
```

All of these should yield more or less the same df in the end (apart from the dtypes and possible replacement of None values):

+---+---------+---------+---------+
|   | COLUMN1 | COLUMN2 | COLUMN3 |
+---+---------+---------+---------+
| 1 | test    | test2   | 1       |
+---+---------+---------+---------+
| 2 | foo     | bar     | 100     |
+---+---------+---------+---------+

Regarding Instant Client setup, the [cx_Oracle installation documentation](https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html#oracle-instant-client-zip-files) contains all you need to know and do. The post you linked to does things like set ORACLE_HOME - which you should never set when using Instant Client (it can cause the wrong config files to be read). And recent versions of Instant Client create the necessary sym links, so that step isn't needed. — Christopher Jones, Apr 06 '21 at 01:27
Thanks for pointing that out. I have updated the post accordingly. — buddemat, Apr 06 '21 at 06:48
I had this problem with DB2 and ibm_db just now after upgrading Python from 3.6 to 3.8. I commented out the jpype code to start up the JVM, and it worked. What I don't understand is why removing jpype worked. — David Gaertner, Apr 17 '21 at 00:09

score 0 · Answer 2 · answered Jan 12 '23 at 12:23

0

Try starting JVM with convertStrings parameter.

jpype.startJVM(jvm_path, args, convertStrings=True)

See https://jpype.readthedocs.io/en/latest/userguide.html#string-conversions for details

answered Jan 12 '23 at 12:23

Untrue

1

Can't properly read SQL table in python: varchar columns imported as comma-separated characters / tuples

2 Answers2

Change your connection

Fix dataframe after the fact:

Linked