2

I'm using azure eventhub in order to stream devices data, and I want to ensure that each device data will go to the same partition. The flow is sending data to iot hub where each device has it's own partition key, and send it to eventhub with same partition key as the iot hub partition key (They have the same amount of partitions). I tried to set eventdata.partitionkey property equals to iothub partition key, but I understand now that this property is hash value and if i'm setting partitionkey = 1 it won't necessarily go to partition 1. And with this solution the eventhub partition distribution is pretty bad (half of the partitions not getting any data at all). I also tried to use CreatePartitionedSender which gives the result, but each time I create partition sender its like creating eventhub client and I get error about the number of connections (Number of AMQP connections per namespace limit).

What will be the better solution:

  1. Write a factory class (Is there factory already written?) for PartitionSender so I will have 1 client for each partition, and I will have to handle health and maybe some concurrency.
  2. Working with eventdata.partitionkey property and set a better hash value (maybe the device id instead of partition number) and I wont have to write anything extra or handle connection errors.

Or maybe there is a better solution?

Update: I tried to set eventdata.partitionkey to the device id, but I got error "All event data in a SendBatch operation must have the same partition key". So this is bad solution since if I split each send by the partition key I will have a lot of send operation of small chunks instead of only 1

. Thanks.

guylot
  • 201
  • 2
  • 13
  • Why not just set partition key to device ID? – Mikhail Shilkov May 07 '18 at 09:28
  • This is one of the options I mentioned. I'm just not sure that the distribution across the event hub partitions will be good enough – guylot May 07 '18 at 11:06
  • If you have enough devices sending data, the distribution will even out eventually. Guaranteed. If you're only sending from a few devices, maybe a service bus partitioned queue is a better fit for your scenario? – Dan May 28 '18 at 14:15
  • If you're interested in seeing how I implemented this, check out my blog post that shows how to get up and running quickly: http://www.zoeller.us/blog/2018/5/28/up-and-running-with-azure-event-hubs – Dan May 28 '18 at 15:43

1 Answers1

1

When I did similar things I pretty much went with option 2, that is splitting into one send operation per partition key. In terms of the number of send operations it's obviously not ideal, but you should be able to measure the performance impact to see if it's a problem. And then you can of course decide to use some multiple of your number of partitions as separate hash keys so that when Azure rehashes it you don't need to worry about collisions and resulting imbalance as much. That is with 32 partitions perhaps use 128 (or more) distinct partition key values so that one partition ends up out of balance in 25% (or less) increments rather than 100%.

I haven't used partitioned sender myself, but I believe the end of this answer may point you in the direction of how to avoid using multiple connections.

Also, if the Send pattern you are using is Partitioned Senders - they all will also use the same underlying MessagingFactory - if you create all Senders from the same eventHubClient(.CreatePartitionedSender()) instance.

cacsar
  • 2,098
  • 1
  • 14
  • 27
  • Thanks that's what I was looking for, I think Ill combine using the messagingFactory and my own factory for partition sender since i'm not sure yet the messagingFactory apply to the partition sender. – guylot May 15 '18 at 15:53