Sentry cannot send a full image snapshot of Hive table access control list to HDFS, so that HDFS ACL and Hive table ACL are not synchronized.
I am running Cloudera CDH 5.14.2, which contains Sentry 1.5.1 and Hadoop 2.6.0. I have enabled Sentry and ACL synchronization with HDFS.
Recently we restart cluster in order to refresh some configurations in HDFS log4j, however, after the system is back, we found the ACL between Hive table and HDFS file are not synchronized.
We then rollback the log4j config and restart the cluster again, however, the acls are still not synchronized between Hive table and HDFS file.
After some investigations, we find it may be caused by a large number of partitions and tables in Hive (indeed, we have millions of tables and partitions in Hive) according to sentry-2183. So we change the configuration sentry.hdfs.service.client.server.rpc-connection-timeout
to 1800000 in the hive-site.xml
file of the metastore server, however, it does not work.
We noticed that there are warning and error messages in Sentry and HDFS log. In Sentry log, it shows an warning about:
"WARN org.apache.thrift.transport.TIOStreamTransport: Error closing output stream. java.net.SocketException: Socket closed"
In HDFS, it shows an error:
"ERROR orgapache.sentry.core.common.transport.RetryClientInvocationHandler: failed to execute getAllUpdateFrom java.lang.reflect.InvocationTargetException", which is caused by "org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutExcpetion: Read time out"
Any idea ?