I was able to read from one HA enabled Hadoop cluster hdfs location and write to another HA enabled hadoop cluster hdfs location using Spark by following the below steps:
1) Check if the KDC in both server is of same or different realms. If it is same then skip this step, other wise setup cross realm authentication between the 2 KDC.
One might follow: https://community.cloudera.com/t5/Community-Articles/Setup-cross-realm-trust-between-two-MIT-KDC/ta-p/247026
Scenario-1 : This is a recurring operation of read and write
2) Edit the hdfs-site.xml of source cluster as per the steps mentioned in:
https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.4/bk_administration/content/distcp_between_ha_clusters.html
3) Add the below property in spark conf at the time of application launch:
spark.kerberos.access.hadoopFileSystems=hdfs://targetCluster-01.xyz.com:8020
Basically, the value should be the InetSocketAddress of the active namenode.
4) In your code, give the absolute path of your target hdfs location.
For eg: df.write.mode(SaveMode.Append).save("hdfs://targetCluster-01.xyz.com/usr/tmp/targetFolder")
Note: In step 4 you can also provide logical path like hdfs://targetCluster/usr/tmp/targetFolder
since we have added the target namservice in our hdfs-site.xml.
Scenario-2 : This is an adhoc request where you just need to perform this operation of read and write only once
Skip step#2 mentioned above.
Follow step#3 and step#4 as it is.
PS: The user of the job should have access to both the clusters for this to work.