2

Questions like this one seem to indicate that a database can be queried directly from pyspark, I would like to update a data pipeline that uses sqoop to use this instead. But with sqoop you may use -P and your credentials will be hidden. I don't see how to use this for the jdbc interface, all the examples I can find suggest hardcoding usn/pass into the scripts. The data in my environment is sensitive so I cannot do this.

df = sqlCtx.load(source="jdbc",
                 url="jdbc:oracle:thin://x.x.x.x/xdb?user=****&password=****",
                 dbtable="somequery")

I have read that even libraries such as getpass which hide the input from the terminal are sometimes vulnerable to memory attacks. Is there a safe way to do this?

Community
  • 1
  • 1

0 Answers0