I have an application that creates a few dataframes, writes them to disk, then runs a command using vertica_python to load the data into Vertica. The Spark Vertica connector doesn't work because of an encrypted drive.
What I'd like to do, is have the application run the command to load the data, then move on to the next job immediately. What it's doing however, is waiting for the load to be done in Vertica before moving to the next job. How can I have it do what I want? Thanks.
What's weird about this problem is that the command I'd like to have run in the background is as simple as db_client.cursor.execute(command)
. This shouldn't be blocking under normal circumstances, so why is it in Spark?
To be very specific, what is happening is that I'm reading in a dataframe, doing transformations, writing to s3, and then I'd like to start the db loading the files from s3, before moving taking the transformed dataframe, transforming it again, writing to s3, loading to db.... multiple times.