2

We use Cloudera (CDH 5.7.5) and Hue [3.9.0]. For admin user, some of hive tables (60%) is accessible through impala. The other hive tables is not accessible. For non admin user, no database which is accessible through Impala. And again, some of database is accessible via hive.

Is it because Impala catalog not sync with hive metastore? When I try to run invalidate metadata (for all database) I got read operation timeout error message. I try to run invalidate metadata for some tables, but do not solve the problem, no impact. What I need to check.,

FYI, I'll got this error every time I run query via Impala. But not for hive.

AuthorizationException: User 'test.user' does not have privileges to execute 'SELECT' on: default.test01 

FYI2, invalidate metadata running well. For admin user, all databases and tables is accessible via hive & impala. But for non admin user, authorized database only accessible through hive (impala no)

This is part of hue log:

[13/Jul/2018 10:32:05 +0700] thrift_util  DEBUG    Thrift call: <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.CloseOperation(args=(TCloseOperationReq(operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=3, operationId=THandleIdentifier(secret="o\xe8}\x9a\xf6'F\x8d\x9aC\xd4!\xb2#:\x91", guid="o\xe8}\x9a\xf6'F\x8d\x9aC\xd4!\xb2#:\x91"))),), kwargs={})
[13/Jul/2018 10:32:05 +0700] thrift_util  DEBUG    Thrift call <class 'ImpalaService.ImpalaHiveServer2Service.Client'>.CloseOperation returned in 1ms: TCloseOperationResp(status=TStatus(errorCode=None, errorMessage=None, sqlState=None, infoMessages=None, statusCode=0))
[13/Jul/2018 10:32:05 +0700] access       INFO     10.192.64.252 myuser.test - "POST /notebook/api/autocomplete/ HTTP/1.1"
[13/Jul/2018 10:32:05 +0700] dbms         DEBUG    Query Server: {'SESSION_TIMEOUT_S': 43200, 'QUERY_TIMEOUT_S': 600, 'server_name': 'impala', 'server_host': 'serverhost.com', 'querycache_rows': 50000, 'server_port': 21050, 'auth_password_used': False, 'impersonation_enabled': True, 'auth_username': 'hue', 'principal': 'impala/serverhost.com'}
[13/Jul/2018 10:32:05 +0700] dbms         DEBUG    Query Server: {'SESSION_TIMEOUT_S': 43200, 'QUERY_TIMEOUT_S': 600, 'server_name': 'impala', 'server_host': 'serverhost.com', 'querycache_rows': 50000, 'server_port': 21050, 'auth_password_used': False, 'impersonation_enabled': True, 'auth_username': 'hue', 'principal': 'impala/serverhost.com'}
Mahadi Siregar
  • 615
  • 3
  • 17
  • 38
  • Do you use Sentry and Kerberos in your cluster? Your table you can access with Hive are using any kind of Serde that Impala may no support ?(json for instance...) – Cheloute Jul 12 '18 at 05:50
  • What about your cluster? Size, impala daemons memory, timeout config... you may start to check these parameters – Cheloute Jul 12 '18 at 06:08
  • I just success to run invalidate metadata. Now schema sync but only for admin user @Cheloute, yes we use Sentry. But for admin user it's accessible, only for admin user. While through hive, tables is accessible to all user., – Mahadi Siregar Jul 12 '18 at 06:18
  • What a strange things... Is Hive configured to integrate Sentry, could you check it? Next, do your admin user and test user belong to the same group? By the way, how do you manage your groups? LDAP? local linux groups? none of them, just "hue" group? Do you observe the same behaviour using impala-shell from a terminal? and using beeline? – Cheloute Jul 12 '18 at 07:09
  • @Cheloute, ya. It's strange. Yes hive integrated to sentry. Admin and test user not in the same group. Group managed in LDAP. I can't access the OS, because it is managed by the other team :( – Mahadi Siregar Jul 12 '18 at 07:51
  • My question is, if sentry already works in Hive. Do we need extra configuration in Impala? I try to compare our configuration with this documentation (https://www.cloudera.com/documentation/enterprise/5-8-x/topics/sg_sentry_service_config.html#concept_z5b_42s_p4__sentryserref), seems everything configured well – Mahadi Siregar Jul 12 '18 at 08:13
  • One thing which not align is that admin user is added to Sentry's Admin Groups., => hue, impala, hive, admin – Mahadi Siregar Jul 12 '18 at 08:25
  • No you don't need anything else but configure Sentry's policies once both Hive and Impala are set up to work with Sentry, By Sentry policies I mean create roles, groups and define permissions. Did you check the "Enable Sentry Synchronization" in HDFS config? – Cheloute Jul 12 '18 at 08:35
  • no, it's not checked. Do we need to check it? But how it impact to schema, I mean schema is just about hive metastore and impala catolog – Mahadi Siregar Jul 12 '18 at 08:43

0 Answers0