3

I'm new to hadoop and impala. I managed to connect to impala by installing impyla and executing the following code. This is connection by LDAP:

from impala.dbapi import connect
from impala.util import as_pandas
conn = connect(host="server.lrd.com",port=21050, database='tcad',auth_mechanism='PLAIN', user="alexcj", use_ssl=True,timeout=20, password="secret1pass")

I'm then able to grab a cursor and execute queries as:

cursor = conn.cursor()
cursor.execute('SELECT * FROM tab_2014_m LIMIT 10')
df = as_pandas(cursor)

I'd like to be able use sqlalchemy to connect to impala and be able to use some nice sqlalchemy functions. I found a test file in imyla source code that illustrates how to create an sqlalchemy engine with impala driver like:

engine = create_engine('impala://localhost')

I'd like to be able to do that but I'm not able to because my call to the connect function above has a lot more parameters; and I do not know how to pass those to sqlalchemy's create_engine to get a successful connection. Has anyone done this? Thanks.

okyere
  • 171
  • 1
  • 3
  • 16
  • 1
    You can use [`connect_args`](http://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine.params.connect_args) to specify extra arguments to `connect()`. – univerio Sep 19 '16 at 22:13

3 Answers3

4

As explained at https://github.com/cloudera/impyla/issues/214

import sqlalchemy
def conn():
    return connect(host='some_host', 
                         port=21050,
                         database='default',
                         timeout=20,
                         use_ssl=True,
                         ca_cert='some_pem',
                         user=user, password=pwd,
                         auth_mechanism='PLAIN')

engine = sqlalchemy.create_engine('impala://', creator=conn)
Jorge Lavín
  • 937
  • 8
  • 22
2

If your Impala is secured by Kerberos below script works (due to some reason I need to use hive:// instead of impala://)

import sqlalchemy
from sqlalchemy.engine import create_engine

connect_args={'auth': 'KERBEROS', 'kerberos_service_name': 'impala'}
engine = create_engine('hive://impalad-host:21050', connect_args=connect_args)

conn = engine.connect()
ResultProxy  = conn.execute("SELECT * FROM db1.table1 LIMIT 5")

print(ResultProxy.fetchall())
ebeb
  • 429
  • 3
  • 12
  • Thanks for this, very hard to find kerberos info for db connections. Any list of available arguments for the connect_args that you are aware of? – Sanchez333 Jun 09 '22 at 14:11
0
import time

from sqlalchemy import create_engine, MetaData, Table, select, and_


ENGINE = create_engine(
    'impala://{host}:{port}/{database}'.format(
        host=host,    # your host
        port=port,
        database=database,
    )
)
METADATA = MetaData(ENGINE)
TABLES = {
    'table': Table('table_name', METADATA, autoload=True),

}
Pegasus
  • 1,398
  • 15
  • 20