2

I'm working on a Hungarian twitter client and I would like to implement a trend system. So I have a database full of texts and dates (unix timestamps) that represent the creation date of the tweet.

So how can I create some kind of php script that gives me about 10 "Trending topics"? I don't even know how to start working on this problem.

Joe Doyle
  • 6,363
  • 3
  • 42
  • 45
19greg96
  • 2,592
  • 5
  • 41
  • 55
  • 8
    It is not that easy task - and you need to start from reading the basics of [Data Mining](http://en.wikipedia.org/wiki/Data_mining) – zerkms Jan 25 '12 at 01:38
  • 1
    Probably worth a read: http://stackoverflow.com/questions/143781/what-is-search-twitter-coms-trending-topics-algorithm – cmbuckley Jan 25 '12 at 01:46
  • Yes, I know how it should work, I just don't know how to get at it in php, or any other language :S – 19greg96 Jan 25 '12 at 01:53
  • One way is to count hashtags and display the top 10 between dates. – Martin Samson Jan 25 '12 at 04:12

1 Answers1

1

You need to design an algorithm that is able to tell you the trends.

To do that you first need to define what a trend is, e.g. a term or a person's name that was used in a twitter feed. Or even consider if some tweet has been re-tweeted and how often / in which period.

So you need to analyse each feed, extract the information you're looking for and then combine it with the time-information to say what's trending, e.g. used more or less in a period compared to some other period.

Parsing of twitter messages can be done with regular expressions.

The keywords can then be saved into a database which acts like an index.

You can then use a data language like SQL to obtain information about trends from the normalized data.

You normally start with simple scripts to do that to test your algorithm.

As it's undefined in your question for what kind of trends you're looking for, the question can only be generally answered. However some tips:

  • Obtain tweets only once, cache them (looks like you already have this).
  • The more data you have, the better you can test your algorithm/system, so obtain data first.
  • Define processes you can apply onto that cached data, like the parsing, normalizing and which database backend to use.
  • Allow your system to have multiple trend algorithms so you can test them against each other.
  • Find out about stop-words (search engines are another related topic that need to filter out unimportant text-information) in your language / domain.
hakre
  • 193,403
  • 52
  • 435
  • 836
  • The problem is that it would be nice to have not just one word / hashtag / mention in a trend but more words, maybe even a sentence eg. "Happy Australia Day", "Page 25 of 366" – 19greg96 Jan 25 '12 at 12:51
  • 1
    That's about parsing the tweet. You need to put relation between words then. As written, get a good chunk of data into your cache so you can tweak the parsing process until it matches your wishes. The more data you have collected, the better you can find out about relations between words. And probably you want to read some books about textual analysis for the languages you want to support. Talk with some linguists, they do research about such things. – hakre Jan 25 '12 at 12:57