3

I'm having a few million/billion (10^9) data-input-sets, that need to be processed. They are quiet small < 1kB. And they need about 1 second to be processed.

I have read a lot about Apache Hadoop, Map Reduce and StarCluster. But I am not sure what the most efficient and fastest way is, to process it?

I am thinking of using Amazon EC2 or a similar cloud service.

El Developer
  • 3,345
  • 1
  • 21
  • 40
Mark
  • 7,507
  • 12
  • 52
  • 88
  • Hadoop and MapReduce are pretty adaptable but they are definitely better at some things. Are you willing/able to code? What languages do you know? What kind of processing do you need to do on the data? – Paul M Jul 24 '12 at 19:47
  • I guess I could have just looked at your profile ;) – Paul M Jul 24 '12 at 19:49
  • @PaulM The language does not matter, I know Python, Java, Ruby, C, C++ so I will (hopefully) be able to learn it :) The input is a small String and it will be processed like a sha512 hash - at least it is some hash-like function - but other details I am not allowed to provide. – Mark Jul 24 '12 at 19:52
  • Sounds like you're working on a rainbow table / password cracker? – BonanzaDriver Jul 25 '12 at 14:40

1 Answers1

3

You might consider something like Amazon EMR which takes care of a lot of the plumbing with Hadoop. If your just looking to code something quickly, hadoop streaming, hive and PIG are all good tools for getting started with hadoop w/out requring you to know all of the ins and outs of MapReduce.

Paul M
  • 2,006
  • 17
  • 10
  • Thanks for your reply. I have added some details in the question's comment. Can you recommend some special method (streaming/hive/pig)? Sorry, that I can not provide some more details. – Mark Jul 24 '12 at 19:55
  • In that case, I would try using hadoop streaming on Amazon EMR. Hadoop streaming let's you write MapReduce programs like unix pipelines using your language of choice. The tradeoff is a performance penalty that may or may not be meaningful to you. Amazon EMR saves you the trouble of spinning up a cluster. You do have to pay for Amazon EMR. – Paul M Jul 24 '12 at 20:16
  • Thanks, I will have a deeper look at it. – Mark Jul 24 '12 at 20:59