I have already Developed a Typing Software to capture Text Typed by candidates in my institutes using PHP & MySQL. In the continuation process, I am stuck with a strategic issue as to how should I compare the Similarity of Texts typed by the Candidates with the Standard Paragraph which I had given them to Type(in the form of Hard Copy, though the same copy is also stored in the MySQL database). My dilemma is that, whether I would use the Levensthein Distance Algorithm in PHP or in MySQL directly itself so that the performance issue is optimized. Actually. I am afraid if Programming in PHP would come out erroneous while evaluating the Texts. It is worthwhile to mention here that the Texts would be compared to get the rank on the basis of Words Typed Per Minute.
1 Answers
The simplest solution would be to utilize PHP's built-in levenshtein
docs function to compare the two blocks of text. If you wanted to back the processing off to the MySQL database, you could implement the solution listed in Levenshtein: MySQL + PHPStackOverflow
Another PHP option might be the similar_text
docs function.
The unfortunate drawback for the PHP levenshtein function is that it cannot handle strings longer than 255 characters. As per the php manual docs:
This function returns the Levenshtein-Distance between the two argument strings or -1, if one of the argument strings is longer than the limit of 255 characters.
So, if your paragraphs are longer than that you may be forced to implement a MySQL solution, though. I suppose you could break the paragraphs up into 255-character blocks for comparison (though I can't say definitively that this won't "break" the levenshtein algorithm).
I'm not an expert in linguistics parsing and processing, so I can't speak to whether these are the best solutions (as you mention in your question). They are, however, very straightforward and simple to implement.

- 1
- 1
-
Thanks rdlowrey. Can you pls see the script available in http://www.phpclasses.org/package/6220-PHP-Compares-strings-to-determine-similarity-level.html if that can overcome the 255 character limitation of levenshtein function in PHP. I am not sure if the link would be available directly to you without membership. If giving a link to another site is a violation please delete my post and forgive me. this is not intentional. – Samcoder Jan 27 '12 at 06:22