I have Spark 3 cluster setup. I have some data in SQL server and its size is around 100 GB. I have to perform different queries on this data from Spark cluster. I have connected to SQL server from Spark via JDBC and run a sample query. Now, instead of query execution on SQL server, I want to run query after moving/copying data to Spark cluster (as SQL server is taking too much time, hence we are using Spark). There are around 10 tables in the database.
What are the possible ways to achieve this ?
If I directly execute query from Spark to SQL server, then it takes too much time as its a bottle neck (running on one system). Is there any better way for this