5

I'm trying to create a table in a Hive Database using SqlAlchemy ORM. My setting is Python 3.6 with PyHive==0.6.1 and SQLAlchemy==1.2.11 (with their relative dependencies) and Hive 1.1.0-cdh5.15.1.

My approach is the following:

from sqlalchemy import create_engine

host = 'localhost'
port = 10000
database = 'foo'

engine = create_engine(f'hive://{host}:{port}')
engine.execute(f'CREATE DATABASE {database}')
engine.execute(f'USE {database}')

this works fine connecting to Hive and creating a new database. At this point I create the data model:

from sqlalchemy import Column
from sqlalchemy import String
from sqlalchemy import Integer
from sqlalchemy.ext.declarative import declarative_base

ModelBase = declarative_base()


class TestTable(ModelBase):
    __tablename__ = 'test_table'

    id = Column(Integer, primary_key=True)
    text = Column(String(32), index=True)

and I try to:

ModelBase.metadata.create_all(engine)

with no success :( because the following exception raises:

OperationalError: (pyhive.exc.OperationalError) TExecuteStatementResp(status=TStatus(statusCode=3, infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Error while compiling statement: FAILED: ParseException line 3:10 mismatched input 'NOT' expecting ) near 'INT' in create table statement:28:27", 'org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:400', 'org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:187', 'org.apache.hive.service.cli.operation.SQLOperation:runInternal:SQLOperation.java:271', 'org.apache.hive.service.cli.operation.Operation:run:Operation.java:337', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:439', 'org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:405', 'sun.reflect.GeneratedMethodAccessor21:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:606', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:415', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1924', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59', 'com.sun.proxy.$Proxy28:executeStatement::-1', 'org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:257', 'org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:501', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1313', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615', 'java.lang.Thread:run:Thread.java:745', "*org.apache.hadoop.hive.ql.parse.ParseException:line 3:10 mismatched input 'NOT' expecting ) near 'INT' in create table statement:32:5", 'org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:208', 'org.apache.hadoop.hive.ql.parse.ParseDriver:parse:ParseDriver.java:170', 'org.apache.hadoop.hive.ql.Driver:compile:Driver.java:524', 'org.apache.hadoop.hive.ql.Driver:compileInternal:Driver.java:1358', 'org.apache.hadoop.hive.ql.Driver:compileAndRespond:Driver.java:1345', 'org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:185'], sqlState='42000', errorCode=40000, errorMessage="Error while compiling statement: FAILED: ParseException line 3:10 mismatched input 'NOT' expecting ) near 'INT' in create table statement"), operationHandle=None) [SQL: '\nCREATE TABLE `test_table` (\n\t`id` INT NOT NULL, \n\t`text` STRING, \n\tPRIMARY KEY (`id`)\n)\n\n'] (Background on this error at: http://sqlalche.me/e/e3q8)

in which I think this is the relevant part:

Error while compiling statement: FAILED: ParseException line 3:10 mismatched input 'NOT' expecting ) near 'INT' in create table statement"), operationHandle=None) [SQL: '\nCREATE TABLE `test_table` (\n\t`id` INT NOT NULL, \n\t`text` STRING, \n\tPRIMARY KEY (`id`)\n)\n\n']

Here PyHive#sqlalchemy the example assume the table to be already there, but what if I need to create it?

Pierluigi
  • 1,048
  • 2
  • 9
  • 16
  • Dear @Pierluigi , could you find the solution? As far as I search, there is no way to use SQLalchemy.ext.declarative.declarative_base for making query or creating table. Also, it seems PyHive is not able to create Hive table at all, it's developed to make queries. – Seyed Hasan Mousavi Sep 08 '20 at 17:58
  • @SeyedHasanMousavi sorry no updates on this :( – Pierluigi Sep 19 '20 at 14:24
  • @Pierluigi Hi, I'm also dealing the same problem as you. Have you found a way to create your tables. Thanks in advance – Dhayf OTHMEN Feb 03 '23 at 23:43
  • @DhayfOTHMEN sorry a lot of time passed :( what about the solution provided by RKB below? – Pierluigi Feb 14 '23 at 08:56

1 Answers1

3

I recently ran into the same issue with a similar error. Oddly I found that if I re-ran the cell in Jupyter the table was created in Hive.

To get rid of the error and have the code run smoothly, I added a create table statement. The final code below ran without error. If you need to add more data you should just be able to comment out the engine.execute line. Good luck!

from sqlalchemy import create_engine

#Input Information
host = 'username@local-host'
port = 10000
schema = 'hive_schema'
table = 'new_table'

#Execution
engine = create_engine(f'hive://{host}:{port}/{schema}')
engine.execute('CREATE TABLE ' + table + ' (col1 col1-type, col2 col2-type)')
Data.to_sql(name=table, con=engine, if_exists='append')
RKB
  • 73
  • 1
  • 11
  • Not sure how it could have gone through in Jupyter either, since `NOT NULL` (the query that sqlalchemy emits) is not a valid usage in CREATE TABLE: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable – GuSuku Jul 25 '19 at 15:22