Questions tagged [partitioner]

Partitioners are software components that divide possibly very large groups of data into some number of smaller groups of data of hopefully equal size.

This is a performance technique that reduces the amount or time spent processing the entire set of data with algorithms having exponential magnitude.

59 questions
13
votes
2 answers

how to sort word count by value in hadoop?

hi i wanted to learn how to sort the word count by value in hadoop.i know hadoop takes of sorting keys, but not by values. i know to sort the values we must have a partitioner,groupingcomparator and a sortcomparator but i am bit confused in applying…
user1585111
  • 1,019
  • 6
  • 19
  • 35
11
votes
2 answers

Why does sortBy transformation trigger a Spark job?

As per Spark documentation only RDD actions can trigger a Spark job and the transformations are lazily evaluated when an action is called on it. I see the sortBy transformation function is applied immediately and it is shown as a job trigger in the…
Prabu Soundar Rajan
  • 799
  • 1
  • 8
  • 14
9
votes
2 answers

Difference between combiner and partitioner

I am a newbie to MapReduce and I just can't figure out the difference in the partitioner and combiner. I know both run in the intermediate step between the map and reduce tasks and both reduce the amount of data to be processed by the reduce task.…
harshit
  • 333
  • 1
  • 2
  • 13
7
votes
2 answers

In Hadoop Map-Reduce, does any class see the whole list of keys after sorting and before partitioning?

I am using Hadoop to analyze a very uneven distribution of data. Some keys have thousands of values, but most have only one. For example, network traffic associated with IP addresses would have many packets associated with a few talkative IPs and…
Jim Pivarski
  • 5,568
  • 2
  • 35
  • 47
6
votes
0 answers

Using KeyFieldBasedPartitioner and Secondary Sorting in Java Hadoop similar to Hadoop Streaming

When using Hadoop streaming, the partitioner and sorter can be set and configurated like this: hadoop jar /opt/hadoop/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar \ -D mapreduce.map.output.key.field.separator=. \ -D…
irondwarf
  • 195
  • 1
  • 8
6
votes
2 answers

Hadoop partitioner

I want to ask about Hadoop partitioner ,is it implemented within Mappers?. How to measure the performance of using the default hash partitioner - Is there better partitioner to reducing data skew? Thanks
Nada Ghanem
  • 451
  • 6
  • 16
6
votes
1 answer

Hadoop send record to all reducers

How can I send a specific record to all my reducers ? I know the Partitioner class and what it does, but I don't see any easy way of making sure a record goes to all the reducers. Basically, the Partitioner has this method: int getPartition(K2…
Razvan
  • 9,925
  • 6
  • 38
  • 51
4
votes
3 answers

Using a partitioner in C# to parallel query a REST-API with pagination

I was wondering if my approach is good to query a REST-API in parallel because there is a limit on how many results can be obtained with one request (1000). To speed up things I want to do this in parallel. The idea is to use a partitioner to create…
4
votes
1 answer

Why is a parallel-processing much slower for a first call in C#?

I am trying to process numbers as fast as possible with C# app. I use a Thread.Sleep() to simulate a processing and random numbers. I use 3 different techniques. This is test code that I used: using System; using System.Collections.Concurrent; using…
Pavol
  • 552
  • 8
  • 19
4
votes
1 answer

How to allow different keyspaces to use different partitioners in Cassandra?

I am new to Cassandra and have a basic question regarding its partitioners. According to the Cassandra document, the partitioner of a cluster should be set in the cassandra.yaml file. My question is: does this mean all keyspaces in a Cassandra…
keelar
  • 5,814
  • 7
  • 40
  • 79
2
votes
2 answers

The default Kafka partitioner create hash key collision

I have a topic with 10 partitions, and I have generate events with A,B,C,D,E,F,G,H,I 9 different keys. I've observed messages doing this: Partition 0- (Message1, Key E), (Message2, Key I) Partition 1- (Message3, Key F) . . Partition7-(Message4,…
Dipperman
  • 119
  • 1
  • 12
2
votes
2 answers

How to write Kafka Consumer Client in java to consume the messages from multiple brokers?

I was looking for java client (Kafka Consumer) to consume the messages from multiple brokers. please advice Below is the code written to publish the messages to multiple brokers using simple partitioner. Topic is created with replication factor "2"…
Gopi
  • 619
  • 2
  • 9
  • 27
2
votes
3 answers

What's the difference between shuffle phase and combiner phase?

i'm pretty confused about the MapReduce Framework. I'm getting confused reading from different sources about that. By the way, this is my idea of a MapReduce Job 1. Map()-->emit 2. Partitioner (OPTIONAL) --> divide intermediate…
rollotommasi
  • 461
  • 1
  • 6
  • 11
2
votes
1 answer

repartition and sort within partition and custom partitioner in spark giving array out of bound exception

6 I tried to implement what is explained here. It is working when i keep number of partition in custom partition equal to one but when i change this keep any other value it gives out array out of bound exception Exception in thread "main"…
deenbandhu
  • 599
  • 5
  • 18
2
votes
0 answers

How to avoid input traffic increase for Kafka brokers when using a custom partitioner?

In order to smooth traffic between all Kafka partitions, I tried to make a custom partitioner (extending kafka.producer.Partitioner) on my producers to replace default partitioner that only change partitions every 10 minutes. My partitioner uses a…
1
2 3 4