0

I want to remove Similar Data from my database . Currently I'm now able to delete Duplicate data from my database and keep one .

$sql = "UPDATE `clf_ads` SET `enabled`= '0' WHERE adid NOT IN (SELECT * FROM (SELECT MAX(adid) FROM clf_ads GROUP BY adtitle)  x)";
if ($conn->query($sql) === TRUE) {
echo "Record deleted successfully";
} else {
echo "Error deleting record: " . $conn->error;
 }
$conn->close();
?>

How to delete data which match 80% similar?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Shuvo
  • 17
  • 5
  • What do you mean by 80% similar? Are you referring to just one field in the table, or several? – markmoxx Dec 10 '18 at 17:34
  • I'm pretty sure the only way to do this would be in your PHP application code. I don't think there is any way to do this in only MySQL. – dmikester1 Dec 10 '18 at 19:05
  • It would help if you gave us a small sampling of your data(10-20) records with some that are considered 80%+ similar that would need to be removed. – dmikester1 Dec 10 '18 at 19:06
  • After glancing over the link Mark shared, it appears it may very well be possible in just MySQL. But it would still be a big help if you showed some sample data. – dmikester1 Dec 10 '18 at 19:38

1 Answers1

0

There is no such thing as 80% similar, unless you have a specific value you're testing against. For example, if your table has the following:

-----------------------------------
|  adid |         adtitle         |
-----------------------------------
|   1   |  'this is an ad title'  |      
|   2   |  'this is the ad title' |
|   3   |  'that is the ad title' | 

Then, if you're just counting similarity between words, ad 1 is 80% similar to ad 2, and 60% similar to ad 3. However, ad 2 is 80% similar to ad 1, and 80% similar to ad 3.

Therefore, what your asking for currently is not possible unless you single out an ad you want to keep, and then compare the similarity of other titles with that title.

To compare the similarity between fields in MySQL, you'll want to take a look at this question: how to compute similarity between two strings in MYSQL

markmoxx
  • 1,492
  • 1
  • 11
  • 21