3

I am creating an application like twitter.

I am stuck at a point.
I have all the tweets stored with users profiles.

Now I have to create an algorithm to work better in performance wise to calculate the most trending words in the whole application among all the users.

As my layman approach :

  1. Hit the complete database
  2. Search for recurring words
  3. Created a record for words having the recurrences
  4. Keep track of say 1000 most recurring words

But on a big application, that seems pretty heavy to me

Can anyone suggest some better approaches?

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138

1 Answers1

0

You probably only want to retrieve the posts from the last hour or day or so, not from the entire database

You should filter out extremely common words, such as the 100 most common English words - you don't want "the" to be a trending word

Likewise I recommend you only count a word once per post, so a post with "booger booger booger booger booger" and a post with "booger" both qualify as having only one instance of the word "booger"

If you don't need to know the precise word count, then you can probably make do with scanning a random sample of the most recent posts, e.g. 10% of them

If you can use a divide and conquer approach then this will help to speed things up

Zim-Zam O'Pootertoot
  • 17,888
  • 4
  • 41
  • 69