0

1.

I am using Twitter Streaming API to get some tweets with a specific hashtag. I want to extract some meta data from each tweet and use them to update some local data structures.

Sometimes lots of tweets will come to my pc in a short time. I am not sure if the processing speed is faster than the the speed of tweets flow. I want to guarantee that all the tweets can be received successfully and each of them can be proceed.

So I want to ask if I have to add some structures to cache the tweets I received? If yes, could you give advices the structure or tools? A buffer, A threading pool or some caching software like memecached or redis?

2.

I also want to use the Twitter Search API, which is a RESTful api, to get some tweets. I would get 100 tweets in one query. Is it necessary to cache the tweets in this case?

The program will not process these tweets until enough numbers (about 30,000)of tweets have been collected. Should I use the map-reduce pattern to process these tweets in such a volume?

Thanks a lot!

Jane
  • 687
  • 3
  • 10
  • 20

1 Answers1

1

Sometimes lots of tweets will come to my pc in a short time. I am not sure if the processing speed is faster than the the speed of tweets flow. I want to guarantee that all the tweets can be received successfully and each of them can be proceed.

That's exactly what message broker are for (see this question), just add these tweets in a queue and consume them. That way you will be able to scale your consumer process vertically or horizontally if the queue size grow too much.

The program will not process these tweets until enough numbers (about 30,000)of tweets have been collected. Should I use the map-reduce pattern to process these tweets in such a volume?

That's batch-processing against online-processing, using a queue you'll be able to do both. Your consumer process will just have to ask for the queue size each X seconds (using the message broker api) and when the queue will be higher than a specific threeshold (30K here) the consumer will start to consume it.

Community
  • 1
  • 1
FGRibreau
  • 7,021
  • 2
  • 39
  • 48