5

My code is connecting to sql server using pyspark. For that connection i am getting encrypted password in jceks. How can i decrypt that password and use to load the tables from sql server. Please suggest.

import pyspark
import re
from pyspark_llap import HiveWarehouseSession
from pyspark.sql.functions import struct
from pyspark.sql.functions import *
from pyspark.sql.session import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL data source example") \
    .getOrCreate()

hive = HiveWarehouseSession.session(spark).build()

df1 = spark.read.format("jdbc") \
    .option("url", "URL") \
    .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
    .option("dbtable", "tableName") \
    .option("user", "user") \
    .option("password", "password_alias").load()
Amrutha K
  • 204
  • 1
  • 3
  • 13
  • @AmruthaK It sounds like `jceks` is Java-proprietary, so you'd need to use [tag:jython] to interact with the [tag:java] libs. – Maximilian Burszley Aug 06 '19 at 14:22
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/197561/discussion-on-question-by-amrutha-k-how-to-decrypt-the-password-in-python). – deceze Aug 06 '19 at 14:22
  • use pyjks python module to decrypt – Narsireddy Aug 06 '19 at 15:03
  • Thanks. I got answer for my question from this link. https://community.hortonworks.com/articles/108918/implementing-data-integrity-check-using-spark-jdbc.html – Amrutha K Aug 07 '19 at 09:36

1 Answers1

3

I know it's little late to answer this question, this is one of the ways to pass the alias as password

You need to decrypt the password_alias using hadoopConfiguration, and pass it to spark.

#Declaring Jceks path and password alias path
    jceks_path="jceks://hdfs/sqlserver.password.jceks"
    alias="password_alias"

# Reading the path of jceks using hadoopConfiguration
    conf = spark.sparkContext._jsc.hadoopConfiguration()
    conf.set('{0}'.format("hadoop.security.credential.provider.path"), jceks_path)

# Get password and make it a string.
    credential_raw = conf.getPassword(alias)
    password = ''
        for i in range(credential_raw.__len__()):
            password = password + str(credential_raw.__getitem__(i))

# Pass the password string to spark.
    df1 = spark.read.format("jdbc") \
        .option("url", "URL") \
        .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
        .option("dbtable", "tableName") \
        .option("user", "user") \
        .option("password", password).load()
Rohit Nimmala
  • 1,459
  • 10
  • 28