1

Scenario/Code Details


I am creating a spark session object to store data into hive table, as:

_sparkSession = SparkSession.builder().
                    config(_sparkConf).
                    config("spark.sql.warehouse.dir", "/user/platform").
                    enableHiveSupport().
                    getOrCreate();

After deploying my JAR to the server, I get below exception:

Caused by: org.apache.spark.sql.AnalysisException:
org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:org.apache.hadoop.security.AccessControlException:
Permission denied: user=diplatform, access=EXECUTE,
inode="/apps/hive/warehouse":hdfs:hdfs:d---------
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)

In My hive-site.xml I gave the configurationsbelow. We are adding this xml to our spark code so that default xml at /etc/hive/conf could be overriden:

<property>
  <name>hive.security.metastore.authenticator.manager</name>
  <value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
</property>

<property>
  <name>hive.security.metastore.authorization.auth.reads</name>
  <value>false</value>
</property>

<property>
  <name>hive.security.metastore.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider</value>
</property>

<property>
  <name>hive.metastore.authorization.storage.checks</name>
  <value>false</value>
</property>

 <property>
  <name>hive.metastore.cache.pinobjtypes</name>
  <value>Table,Database,Type,FieldSchema,Order</value>
</property>

    <property>
  <name>hive.metastore.client.connect.retry.delay</name>
  <value>5s</value>
</property>

<property>
  <name>hive.metastore.client.socket.timeout</name>
  <value>1800s</value>
</property>

<property>
  <name>hive.metastore.connect.retries</name>
  <value>24</value>
</property>

 <property>
  <name>hive.metastore.execute.setugi</name>
  <value>true</value>
</property>

 <property>
  <name>hive.metastore.failure.retries</name>
  <value>24</value>
</property>

<property>
  <name>hive.metastore.kerberos.keytab.file</name>
  <value>/etc/security/keytabs/hive.service.keytab</value>
</property>

<property>
  <name>hive.metastore.kerberos.principal</name>
  <value>hive/_HOST@EXAMPLE.COM</value>
</property>

<property>
  <name>hive.metastore.pre.event.listeners</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener</value>
</property>

<property>
  <name>hive.metastore.sasl.enabled</name>
  <value>true</value>
</property>

<property>
  <name>hive.metastore.server.max.threads</name>
  <value>100000</value>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://masternode1.com:9083</value>
</property>

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/platform</value>
</property>

Questions:


  1. The whole development team is now not sure why and from where this path: /apps/hive/warehouse is being taken from, even after overriding our custom hive-site.xml?

  2. Is it that internal HDFS framework calls this location to store its intermediate results and it requires execute permission to this path?

As per policy we cannot provide 777 level access at /apps/hive/warehouse to users because of two reasons:

There is possibility that in future there would be other set of different users. It is not safe to provide 777 to users at warehouse.

  1. Are the above two reasons correct or is there some workaround?
Phantômaxx
  • 37,901
  • 21
  • 84
  • 115

2 Answers2

2

The Hive metastore has its own XML file that determines where Hive tables are located on HDFS. This property is determined by HiveServer, not Spark

For example, on a Hortonworks cluster, notice that the warehouse is 777 permissions and owned by the hive user and hdfs superuser group.

$ hdfs dfs -ls /apps/hive
Found 2 items
drwxrwxrwx   - hive hadoop          0 2018-02-27 20:20 /apps/hive/auxlib
drwxrwxrwx   - hive hdfs            0 2018-06-27 10:27 /apps/hive/warehouse

According to your error, that directory exists, but no user can read, write or list the contents of that warehouse directory.

Ideally, I would suggest not putting the warehouse in an HDFS user directory.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • We may change the warehouse location to any other directory but error still persists. How to change the dafult location (/apps/hive/warehouse) which HDFS is picking?? – paresh Bapna Jun 27 '18 at 16:49
  • Want to highlight there is kerberos authentication and ranger authorization is enabled at the server. This is enforcing diplatform user not to have access of /apps/hive and /apps/hive/* – paresh Bapna Jun 27 '18 at 16:53
  • HDFS isn't picking anything. The Hive Metastore process is. You need to change the hive-site on that server and reboot Hive processes. I don't have experience with Kerberos or Ranger – OneCricketeer Jun 27 '18 at 17:45
  • Following link help you to change Metastore https://stackoverflow.com/questions/30518130/how-to-set-hive-metastore-warehouse-dir-in-hivecontext – vaquar khan Jun 27 '18 at 18:10
0

Seems like permission issue on HDFS with user "diplatform".

Login with admin user and perform the following operations

hadoop fs -mkdir -p /apps/hive/warehouse
hadoop fs -mkdir /tmp
hadoop fs -chmod -R 777 /user/hive
hadoop fs -chmod 777 /tmp

Then after create database statement from "diplatform".

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
vaquar khan
  • 10,864
  • 5
  • 72
  • 96
  • You are correct. Its permission issue at org level. As per authorization policy its not allowed to give full access to /apps/hive/warehouse. Cant we set this location to some other location???I tried to set it at hive-site.xml and include this xml in my code but all in veins nothing worked. I am still seeing the exception. – paresh Bapna Jun 27 '18 at 16:43
  • Can we impersonate and resolve this???I tried setting doAS property to false but it didn't worked. Moreover when I ran : $ hdfs dfs -ls /apps/hive from diplatform user it gave the same exception: ls: Permission denied: user=diplatform, access=EXECUTE, inode="/apps/hive":hdfs:hdfs:d- – paresh Bapna Jun 27 '18 at 17:45