Let's assume we have a database like:
Actions_tbl:
-------------------------------------------------------- id | Action_name | user_id| -------------------------------------------------------- 1 | John reads one book | 1 2 | reading the book by john | 1 3 | Joe is jumping over fire | 2 4 | reading another book | 2 5 | John reads the book in library | 1 6 | Joe read a book | 2 7 | read a book | 3 8 | jumping with no reason is Ronald's habit| 3
Users_tbl:
----------------------- user_id | user_name | ----------------------- 1 | John 2 | Joe 3 | Ronald 4 | Araz -----------------------
Wondering if I can choose the most repeated similar action regardless of it's user and replace my own user_name with its current user!
Read one book, reading the book, reading another book, read the book in library, read a book and read a book are the ones who have most common WORDS so the staffs related to reading the book is repeated 6 times, my system should show one of those six sentences randomly and replace Araz with user_name
Like: Araz reads the book
My Idea was to
select replace(a.action_name , b.user_name) from actions_tbl a, user_tble b where a.user_id = b.user_id group_by
and then check the similarities one by one in php using
levenshtein()
But this one doesn't have performance at all!
Assume that I want to do the same thing for a big db and for few different tables. This will destroy my server!!!
Any better IDEA?
in http://www.artfulsoftware.com/infotree/queries.php#552 the levenshtein() function is implemented as a MySQL function but firstly, do u think it has enough performance? and then, how to use it in my case? Maybe a self-join van fix this issue but I'm not that good with sql!
* similar action, are the actions that have more than X% common words
** More information and notes:**
I'm limited to PHP and MySQL.
This is just an example, in my real project the actions are long paragraphs. That's why the performance is a matter. The real scenario is: user inputted the description of its project for several projects, those data may be too similar(users would have the same area of work), I want to fill automatically(base on previous fillings) the description of next project, to save time.
I would appreciate if you can have any pragmatical Solution. I checked the NLP related solutions, although they r interesting, but I don't think many of them can be accurate and can be implemented using PHP.
The output should make sense and be a proper paragraph like all other projects. That's why I was thinking of choosing from previous ones.
Thanks for your intellectual answers, its really appreciated if you could shed some light on the situations