3

I am using jpype and jaydebeapi to connect to an Oracle database which works fine. However, the SQL returns tuples instead of strings with brackets and a comma after each character.

E.g. instead of dog it returns (d,o,g,)

Running the same SQL statement in DBeaver or Toad returns the string without alteration as you would expect.

Any idea on how to return the string without brackets and comma please? (I can still use the data but it looks weird in a chart and makes the chart and other outputs harder to read).

I have already searched Google and stackoverflow for similar issues (link1,link2,link3) but non of these solutions have worked for me. I need to find a way to loop through all columns to find those with dtype string and then join all tuples in those.

Here is a screenshot of how the data is being returned:

screenshot of dataframe, illustrating the problem

Here is a screenshot of a pivot (showing the data can be used without using brackets and commas):

screenshot of table, illustrating the problem

Here is a screenshot of a chart (again showing string with brackets and commas):

screenshot of chart, illustrating the problem

The code I'm using for the connection is as follows (no issues with this as far as I can see?):

import jpype, jaydebeapi 

import pandas as pd

jpype.startJVM(f"-Djava.class.path=D:\jdbc\ojdbc8.jar")

url = "jdbc:oracle:thin:@(DESCRIPTION=(CONNECT_TIMEOUT=1000)(TRANSPORT_CONNECT_TIMEOUT=1000)(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST="")(PORT=""))(ADDRESS=(PROTOCOL=TCP)(HOST="")(PORT=""))(ADDRESS=(PROTOCOL=TCP)(HOST="")(PORT="")))(ADDRESS_LIST=(LOAD_BALANCE=ON)(ADDRESS=(PROTOCOL=TCP)(HOST="")(PORT=""))(ADDRESS=(PROTOCOL=TCP)(HOST="")(PORT=""))(ADDRESS=(PROTOCOL=TCP)(HOST="")(PORT="")))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME="")))"

user = "" 
password = "" 

conn=jaydebeapi.connect("oracle.jdbc.OracleDriver", url, [user, password])

cursor = conn.cursor()

sql_query = ''''''

cursor.execute(sql_query)
data = cursor.fetchall()

columns = [desc[0] for desc in cursor.description]
columns = [''.join(i) for i in columns]

cursor.close()
data = pd.DataFrame(data)
data.columns = columns

pd.options.display.float_format = '{:,.0f}'.format
pivot = pd.pivot_table(data, index=['COUNTRY'], values=['CUSTOMERS'], aggfunc=(sum))

pivot.plot(kind='bar')

conn.close()
jpype.shutdownJVM()

Here is what I have tried so far:

for col in data.columns:
    for ix in data.index:
        data[ix][col] = ''.join(data[ix][col])

This returns a

Keyerror:0

data.select_dtypes(include=['object']).agg(' '.join)

#and

data.select_dtypes(include=['object']).apply(lambda x: ''.join(x))

Both of these return a

TypeError: sequence item 0: expected str instance, java.lang.String found

I have of course looked up these error messages but can't tell what I'm doing wrong.

Would really appreciate some help!

Thanks!!!

UPDATE: I just noticed that when I turn dtype object into category (which I did so I could order the column in a pivot table) it shows the string without brackets and commas. So then I tried to convert all object columns into category using this code per this post link4:

data = pd.concat([
        data.select_dtypes([], ['object']),
        data.select_dtypes(['object']).apply(pd.Series.astype, dtype='category')
        ], axis=1).reindex(data.columns, axis=1) 

However, now I am getting the following error:

ValueError: setting an array element with a sequence

If I do each column one by one it seems to be working though :-S

buddemat
  • 4,552
  • 14
  • 29
  • 49
JanineKNZ
  • 35
  • 4
  • Can you share how you're using `data` afterwards? – Mureinik Dec 06 '20 at 21:27
  • Hi @Mureinik, I have added the additional code, which isn't too much at the moment as I've been spending most of my time on trying to fix the string formatting issue. Thanks – JanineKNZ Dec 06 '20 at 21:47

3 Answers3

1

You can try running the following command for your data dataFrame:

for col in data.select_dtypes(include=['object']).columns:
    data[col] = data[col].apply(lambda x: ''.join(x) if x is not None else x)

It should go through all columns that include text or mixed numeric and non-numeric values and transform tuple elements to strings.

Kate_E
  • 26
  • 2
0

I see two problems / ways to adress this:

The first way is to solve the "symptoms", i.e. work with the weird output you get and transform it to represent what it should be. This can be done using str.replace and str.join/str.split:

''.join(','.split("('d','o','g')".replace('(','').replace (')','')))

The second and better way, however, is to solve the underlying problem. I do not know where the problem is, but try executing your code line by line with a debugger to see where this malformed strings appear first.

dasjanik
  • 1
  • 1
  • thanks for your answer! The replace method doesn't seem to replace anything in this case, i.e. the code runs fine but the data still shows the same way. I think that's because these aren't actual commas and brackets in strings but tuples that are showing with commas and brackets. As for the debugging, the data shows this way for all non numeric outputs, however, in SQL tools like DBeaver or Toad the data comes back without commas and brackets so no issues with the SQL, it just seems to be the way Python shows the data that is weird. – JanineKNZ Dec 09 '20 at 19:48
0

Rather than treat the symptoms: the cause of the problem seems to lie in the use of jpype (see also this SO answer and the comments).

To prevent the problem from occurring in the first place, you can change your connection, e.g. in one of the following two ways:

  1. Use jaydebeapi without jpype:

    When this problem happened to me, I noticed that the use of jpype was not necessary after all. I passed the driver as a .jar file (in my case locally in the same directory) to jaydebeapi.connect() and my connection worked fine without jpype:

    import jaydebeapi as jdba
    import pandas as pd
    import os
    
    db_host = 'db.host.com'
    db_port = 1521
    db_sid = 'YOURSID'
    
    jar=os.getcwd()+'/ojdbc6.jar'
    
    conn = jdba.connect('oracle.jdbc.driver.OracleDriver', 
                    'jdbc:oracle:thin:@' + db_host + ':' + str(db_port) + ':' + db_sid, 
                    {'user': 'USERNAME', 'password': 'PASSWORD'}, 
                    jar
                    )
    
    df_jay = pd.read_sql('SELECT * FROM YOURSID.table1', conn)
    
    conn.close()
    

    This created the dataframe without problems.

  2. Use cx_Oracle instead:

    The issue also did not occur when I used cx_Oracle to connect to the Oracle db:

    import cx_Oracle
    import pandas as pd
    import os
    
    db_host = 'db.host.com'
    db_port = 1521
    db_sid = 'YOURSID'
    
    dsn_tns = cx_Oracle.makedsn(db_host, db_port, db_sid)
    cx_conn = cx_Oracle.connect('USERNAME', 'PASSWORD', dsn_tns)
    
    df_cxo = pd.read_sql('SELECT * FROM YOURSID.table1', con=cx_conn)
    
    cx_conn.close()
    

    Note: For cx_Oracle to work you have to have the Oracle Instant Client installed and properly set up (see e.g. cx_Oracle documentation for Ubuntu).

If one of these approaches is an option for you, you prevent the entire problem from occurring in the first place.

buddemat
  • 4,552
  • 14
  • 29
  • 49