0

Does spark streaming have a partitionbykey functionality? I would like to partition my dstreams such that the data for the same key is in the same partition.

I.e. data for key 1 is in partition 1, and key 2 is in partition 2 etc.

tsar2512
  • 2,826
  • 3
  • 33
  • 61
  • Use custom partitioner. Look at this post https://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where and this https://stackoverflow.com/questions/29888165/custom-partiotioning-of-javadstreampairrdd – abalcerek Jun 18 '15 at 12:22
  • You mean you have multiple streams and you want Key1 from all streams to go to partition1and so on? – ayan guha Jun 18 '15 at 13:50
  • No i have a single dstream as of now, which i would like to partition by key... Ie all elements of an rdd belonging to key 1 in a dstream are in one partition – tsar2512 Jun 18 '15 at 14:02

0 Answers0