Hadoop: How to create an auto-increment id

Question

I need the SQL equivalent of an AUTO_INCREMENT id in hadoop.

When my reduce task identifies a new item, those items needs a unique ID assigned.

How can I share an atomic counter across the cluster? The reporter counters seem to be just increment counters, there's no getAndIncrement feature that I see.
How can I set that counter before the map/reduce phase of the job starts?

possible duplicate of [Distributed sequence number generation?](http://stackoverflow.com/questions/2671858/distributed-sequence-number-generation) — Praveen Sripati, Oct 27 '12 at 05:23

score 2 · Accepted Answer · edited Dec 27 '19 at 23:11

2

To perform distributed id generation you can either just generate uuids or use functionality found in Apache Zookeeper, which can do distributed coordination on Hadoop clusters. Disclaimer: I have never used Zookeeper, so I don't know if you can really (even theoretically) get a global contiguous set of ids, which is what the question seems to be asking.

Generating UUIDs does have a cost, though; they take some time to generate.

For good general information on distributed ID generation, see this Stack Overflow question.

edited Dec 27 '19 at 23:11

halfer

19,824
17
99
186

answered Oct 27 '12 at 03:21

Ray Toal

86,166
18
182
232

Yeh, they have to be incrementing ID's in a specific range not just unique. – David Parks Oct 27 '12 at 03:40
I thought that is what you wanted. Check out zookeeper then. While I've done a lot with hadoop, I've always generated UUIDs because the very thought of building in a global atomic integer just seemed weird. On a 1,000 node cluster you want 999 machines to wait? Seriously, I expect that the Zookeeper people figured this all out, however intractable it seems. If you can't get what you want, generate uuids in the map phase then create a contiguous set in the reduce phase, or in a separate sequential process _after_ your MR jobs complete. – Ray Toal Oct 27 '12 at 04:48

Hadoop: How to create an auto-increment id

1 Answers1