Questions like this one seem to indicate that a database can be queried directly from pyspark, I would like to update a data pipeline that uses sqoop to use this instead. But with sqoop you may use -P
and your credentials will be hidden. I don't see how to use this for the jdbc interface, all the examples I can find suggest hardcoding usn/pass into the scripts. The data in my environment is sensitive so I cannot do this.
df = sqlCtx.load(source="jdbc",
url="jdbc:oracle:thin://x.x.x.x/xdb?user=****&password=****",
dbtable="somequery")
I have read that even libraries such as getpass
which hide the input from the terminal are sometimes vulnerable to memory attacks. Is there a safe way to do this?