1

I have a system where the user makes a post, this post will include a title, and the post content itself, the content will be anywhere between 20 - 3000 words and consist of plain text.

I also have a set of more then 700 categories, some are top level categories, the rest are subcategories.

When the user enters the content for their post, they need to be prompted with up to 5 relevant categories, selected automatically based on what the user has typed in.

What is the best way to do this, I am using PHP & MySQL, links to any libraries or code samples would be useful.

user1448020
  • 147
  • 1
  • 10
  • Try looking into mysql full text search http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html Try taking a look at this post http://stackoverflow.com/questions/2039240/php-display-links-to-related-content it may help you come up with ideas. You could try matching the content to your categories table and if a certain number of keywords are hit for the category it will boost its relevenancy rating – Kris Jun 22 '12 at 00:03

1 Answers1

0

User perspective

You cannot do this on 1 step in the same page with only php/mysql. There are mainly 2 options for your question.

  • You also learn/use some client-side language and implement it to do a search without changing the page. I don't know much of it, so I cannot really recommend anything specific, but this thread should help you.

  • You use a middle page. That said, the user posts it's content, then after sending it you parse it and offer categories for the user to select in the new page. This has the problem that many users might close the window after pressing 'send' in a comment as they expect it to be sent straight, the good thing is that it only uses php/mysql.

Parsing the text

Once again I'm not sure if this is the most efficient way, but I'd try this and keep testing until achieving the result expected:

First, create a list of few keywords for each category. 4 or 5 should do the trick, but it depends greatly on the categories, text and many other factors.

Then, create an array of 10 elements. 5 would be the id of the category and 5 would be the 'score' for each category. You can set, for example, a score of 1 for each keyword found. Remember to asign some values initially or you'll have nothing to compare it to.

Then I'd search each category keywords in the text. If you get an score superior than any of those obtained previously, substitute the minimum one for the new category.

Echo the 5 categories remaining at the end of the script. They should be the 5 more suitable. But keep in mind that there are many more ways to approach this parsing problem.

Community
  • 1
  • 1
Francisco Presencia
  • 8,732
  • 6
  • 46
  • 90