Can we use spark pool to process data from dedicated SQL pool and is that a good architecture?

Question

Is it a good design to use spark pool for processing data which comes in dedicated SQL pool and again write back to dedicated SQL pool and to adls.

As of now everything we r doing with dedicated SQL pool so if we add spark pool so will it be more efficient or it will just be burden to existing dedicated SQL pool.

This question is fairly broad. I'd imagine it depends heavily on the data, the volume, the schema, and perhaps several other factors. — r2evans, Mar 09 '22 at 14:36
Yes volume wise it's huge as it's steaming data... so hourly it loads millions of rows ...then some processing on that n then moving to consumption part — SLL, Mar 09 '22 at 14:54
You can as long as it's worth it. There is an overhead to copying data over from dedicated SQL pool with the .synapsesql API, and then back again. I would say use Synapse Spark Pools for things you can't already do with SQL, eg Machine Learning, really complex transform, regex etc Can you be more specific about your use case? — wBob, Mar 09 '22 at 15:33
My general rule of thumb: if the input data and output data are both in the same SQL resource, avoid crossing that boundary. Moving data in and out of Dedicated SQL pool can be slow and costly, so you are better off (in most cases) processing the data directly in the SQL pool. — Joel Cochran, Mar 09 '22 at 15:57
Thanks Joel ..Bob.. for your comments...I will add the specific use case shortly to make the things more clear — SLL, Mar 09 '22 at 16:35

score 0 · Answer 1 · answered Mar 10 '22 at 06:57

Yes, you can use spark pool to process data from dedicated SQL pool and is that a good architecture as there it is recommended and directly support by Microsoft Officials.

The Synapse Dedicated SQL Pool Connector is an API that efficiently moves data between Apache Spark runtime and Dedicated SQL pool in Azure Synapse Analytics. This connector is available in Scala.

If your project required large scale streaming you can definitely go for Apache Spark. There won't be any burden on existing architecture. You will get expected results.

Refer: Azure Synapse Dedicated SQL Pool connector for Apache Spark

Can we use spark pool to process data from dedicated SQL pool and is that a good architecture?

1 Answers1