2

I have a table of Arabic text. I want to remove duplicate rows. In view of the symbols in Arabic language: َ ِ ُ

My table: vocabulary

+----+----------+--------------------------------+
| id |   word   |              mean              |
--------------------------------------------------
| 1 |    سِلام    |              xxx               |
--------------------------------------------------
| 2 |    سَلام    |              xxx               |
--------------------------------------------------
| 3 |    سلام    |              xxx               |
--------------------------------------------------
| 4 |    سلام    |              xxx               |
+------------------------------------------------+  

Now i want this table:

+----+----------+--------------------------------+
| id |   word   |              mean              |
--------------------------------------------------
| 1 |    سِلام    |              xxx               |
--------------------------------------------------
| 2 |    سَلام    |              xxx               |
--------------------------------------------------
| 3 |    سلام    |              xxx               |
+------------------------------------------------+

How can i do that ?!

My Try:

$result = mysql_query( "SELECT * FROM vocabulary where");
while($end = mysql_fetch_assoc($result)){

    $word = $end["word"];
    $mean = $end["mean"];
    $id = $end["id"];


$result2 = mysql_query( "SELECT * FROM vocabulary where word='$word' AND mean='$mean'");
$TotalResults = mysql_num_rows($result2);

if($TotalResults>1){

     mysql_query( "DELETE FROM vocabulary WHERE id='$id'");
}

Summary: How can I sensitive MySQL to the Arabic symbols ?

Shafizadeh
  • 9,960
  • 12
  • 52
  • 89

2 Answers2

1

There are multiple ways to achieve this.

1- You can either select your rows from the database, loop through them and save the 'word' title in an array, and in each iteration in the loop, you can check if a similar value is in_array(). If the value exists, then you can save the id in another array and then use these ids to delete from the database.

2- Another way to extract the ids is to use a query similar to the below:

select count(*), id from table group by title

You can then loop through the results and delete the row (using the ids) where count is greater than 1.

The basic concept in both (and other methods) is that you just have to match the strings. Phonetics on letters change the actual string so "سَلام" is not equal to "سلام".

On a side note, there is a great Arabic PHP library you can use for various Arabic related string manipulation: PHP and Arabic Language.

This way will only remove one duplicate.

There are several other ways to do it, and it all depends on the size of the data set you have and if deleting these duplicates is a one time thing or a frequent thing because you will have to keep performance in mind.

Adon
  • 345
  • 1
  • 12
  • I'm happy to you could understand my question, I will check it and give you the result .. – Shafizadeh May 22 '15 at 16:47
  • Glad I could help. Working with Arabic can be a pain sometimes, but it is pretty straight forward if you know what you are doing, and it helps if you know the language as well. – Adon May 22 '15 at 16:51
  • You would only be deleting one duplicate per title this way. Alternately, you might want to include `title` (and `MIN(id)` instead of `id` for predictability) in the results of the "count" query, and then delete where title matches and id does not; this would guarantee the first is always preserved, and would handle cases with more than one duplicate. – Uueerdo May 22 '15 at 16:56
  • Unfortunately it did not work, Can you give me a demo on sqlfiddle ? tnx pal – Shafizadeh May 22 '15 at 16:56
  • @uureedo is right with his comment. Sajad, can you specify what didn't work? – Adon May 22 '15 at 16:59
  • @Adon this is not worked: `select count(*), id from table group by title` – Shafizadeh May 22 '15 at 17:25
  • it is better to i say: I don't know how should i use it ! Can you give me a demo on sqlfiddle ? – Shafizadeh May 22 '15 at 17:26
1

I haven't tested it, but this should work:

CREATE TEMPORARY TABLE tmp_keeps
    SELECT title, MIN(id) AS keepID
    FROM theTable 
    GROUP BY title
;

DELETE FROM theTable 
WHERE (title, id) NOT IN (
    SELECT title, keepID
    FROM tmp_keeps
    )
;

DROP TEMPORARY TABLE tmp_keeps;

It (in the subquery) gets the first id for each title, and then deletes rows that don't meet that condition.

Edit: Revised to avoid SQL error pointed out in comments.

If it is a large table, something along the lines of Adon's answer might be faster.

Uueerdo
  • 15,723
  • 1
  • 16
  • 21
  • 1
    This will not work. MySql won't let you run this query. It will produce SQL error 1093, which is reasonable considering that you are deleting from the data set you are selecting. – Adon May 22 '15 at 17:13
  • It wouldn't work. this error: _You can't specify target table 'test' for update in FROM clause_ – Shafizadeh May 22 '15 at 17:14
  • 1
    Edited, it's no longer a single query, but can still be done entirely in SQL. @Sajad, if you are not familiar with TEMPORARY tables, they only exist as long as the database connection is open, and only for that connection. – Uueerdo May 22 '15 at 17:18
  • Can you give me a demo on sqlfiddle ? – Shafizadeh May 22 '15 at 17:22