2

The documentation says:

It is recommended that new tables which are expected to have heavy read and write workloads have at least as many tablets as tablet servers.

If I have as many tablets as data disks (for instance 3 tablet servers, 10 disks per node, I split the table in 30 partitions), will kudu be smart enough to put a tablet per disk or am I actually limiting performance?

I wonder in theory (assuming a very big table) what would be the best:

  • 3 partitions (1 per tablet server)
  • 30 partitions (1 per disk)
  • more than 30 (because my table is really big)
Guillaume
  • 2,325
  • 2
  • 22
  • 40

1 Answers1

0

I can answer to the best of my knowledge. We have 24 tablet servers but we created tables with more than 400 tablets (partitions). That is the 3rd option in your list. We talked to Kudu Dev group a couple of time and we were told is that the scans will be better if Disk Rowsets are evenly distributed and they are not more than a few Gigabytes per tablet (partition).

We have seen that table queries/writes performed better when we distributed the big table across multiple tablets rather than restricting them to equal number of tablet servers.

Things that impacted us are:

  1. table writes are better when we did asynchronous writes and checked for write status when we flushed data.
  2. table scans are better when the tablets have even and less data sizes.
  3. There is a lot of disk IO when the amount of data in each tablet is very high. This caused write and read issues. We saw Kudu RPC Issues and queue backup when there are minimal tablets.
  4. Also we had a table with more than 15G of data in 1 tablet and the queries are so bad on it and we had to redistribute data.

Our experience was that it's always better to have more tablets with even distribution (evenly distributed compacted Disk Rowsets) and under 10GB per tablet. Make sure compaction is going well and not having issues.

Maverick4U
  • 41
  • 1
  • 1
  • 6