1

I am new to hive. I am using hive with internal S3. I have external table I am trying to create a index . but its failing with this error, Any suggestions.

0: jdbc:hive2://localhost:10000>  CREATE INDEX ix_tab_kafkabatch ON TABLE tab_raw (kafkabatch)  AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'  WITH DEFERRED REBUILD  IN TABLE tab_raw_index_table;

Here is the error:

error: Error while compiling statement: FAILED: ParseException line 1:7 cannot recognize input near 'CREATE' 'INDEX' 'ix_tab_kafkabatch' in ddl statement (state=42000,code=40000) 0: jdbc:hive2://localhost:10000>

Hive logs

2020-06-07T17:51:23,633 DEBUG [HikariPool-1 housekeeper] pool.HikariPool: HikariPool-1 - Pool stats (total=10, active=0, idle=10, waiting=0)
        at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199) ~[hive-service-3.1.1.jar:3.1.1]
        at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260) ~[hive-service-3.1.1.jar:3.1.1]
        at org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) ~[hive-service-3.1.1.jar:3.1.1]
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541) ~[hive-service-3.1.1.jar:3.1.1]
        at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527) ~[hive-service-3.1.1.jar:3.1.1]
        at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_252]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_252]
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) ~[hive-service-3.1.1.jar:3.1.1]
        at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) ~[hive-service-3.1.1.jar:3.1.1]
        at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) ~[hive-service-3.1.1.jar:3.1.1]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_252]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_252]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ~[hadoop-common-3.1.3.jar:?]
        at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) ~[hive-service-3.1.1.jar:3.1.1]
        at com.sun.proxy.$Proxy42.executeStatementAsync(Unknown Source) ~[?:?]
        at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312) ~[hive-service-3.1.1.jar:3.1.1]
        at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562) ~[hive-service-3.1.1.jar:3.1.1]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) ~[hive-service-3.1.1.jar:3.1.1]
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-3.1.1.jar:3.1.1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_252]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_252]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: org.apache.hadoop.hive.ql.parse.ParseException: line 1:7 cannot recognize input near 'CREATE' 'INDEX' 'ix_order_kafkabatch' in ddl statement
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:223) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:74) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:67) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:616) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1826) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1773) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1768) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) ~[hive-exec-3.1.1.jar:3.1.1]
        at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:197) ~[hive-service-3.1.1.jar:3.1.1]
        ... 26 more
eshirvana
  • 23,227
  • 3
  • 22
  • 38
  • 1
    Indexes in Hive was useless feature, it was removed in Hive 3.0 https://stackoverflow.com/a/61317247/2700344 – leftjoin Jun 08 '20 at 07:38
  • Thanks for your comment. I have used case where I am processing files based on kafka events. ``` – user5499459 Jun 08 '20 at 18:35
  • Kafka or not Kafka, but you are using Hive> 3.0. Indexes are removed since 3.0. And according to the exception message, your Hive version is 3.1.1. No indexes any more – leftjoin Jun 08 '20 at 18:46
  • Thanks for your comment. I have used case where I am processing files based on kafka events.I am using presto do the compute and converting csv(_raw) to orc(_hr) table. ``` insert into order_hr select * from order_raw where kafkabatch in (1591591570879,1591591597134,1591591607434.....) ``` _raw - external table with csv files, where another application is writing csv files continusly. _hr - orc table. It takes few minutes to process these files.. I am trying to optimize this – user5499459 Jun 08 '20 at 18:50
  • if one kafkabatch contains many data, consider partitioning on it. Or redesign ingestion flow completely. Forget about indexes they even if possible will do even worse. Describe the process in details, identify the bottleneck - is it select which is slow or something else. post detailed flow descriptions along with queries and their actual plans and stats and we most probably will help – leftjoin Jun 08 '20 at 19:57

0 Answers0