0

I've studied Azure Synapse and distribution types.

Hash-distributed table needs a column to distribute the data between different nodes,

For me it's the same idea of partition, I saw some examples that uses a hash-key, partition and index. It's not clear in my mind their differences and how to choose one of them. How Hash-key, partition and index could work together?

1 Answers1

0

Just an analogy which might explain the difference between Hash and Partition

Suppose there exists one massive book about all history of the world. It has the size of a 42 story building.

Now what if the librarian splits that book into 1 book per year. That makes it much easier to find all information you need for some specific years. Because you can just keep the other books on the shelves. A small book is easier to carry too.

That's what table partitioning is about. (Reference: Data Partitioning in Azure) Keeping chunks of data together, based on a key (or set of columns) that is usefull for the majority of the queries and has a nice average distribution. This can reduce IO because only the relevant chunks need to be accessed.

Now what if the chief librarian unbinds that book. And sends sets of pages to many different libraries. When we then need certain information, we ask each library to send us copies of the pages we need. Even better, those librarians could already summarize the information of their pages and then just send only their summaries to one library that collects them for you.

That's what the table distribution is about. (Reference: Table Distribution Guidance in Azure) To spread out the data over the different nodes.

For more details: What is a difference between table distribution and table partition in sql?

https://www.linkedin.com/pulse/partitioning-distribution-azure-synapse-analytics-swapnil-mule

And indexing is physical arrangement within those nodes

Nandan
  • 3,939
  • 2
  • 8
  • 21
  • That was a nice analogy you made. I was thinking about Indexing in Book. For me, the Hash Distribution in Azure and Indexing in SQL sounded the same. Can you please confirm if they are? – Ashwin Kumar Aug 11 '23 at 07:54