I'm trying to identify specific fields that are duplicated in a table in a mariadb-10.4.20 Joomla database. I would like to identify all rows that have a specific field duplicated, then ultimately be able to remove those duplicates, leaving just the one with the highest ID.
This table contains the IDs, titles and aliases for the articles in a joomla website. The script I'm building (in perl) will use this information to print the primary title alias and create redirects for any others.
I was previously using "group by" but it appears there's been a change recently in how it's used, and now it doesn't work properly. I don't understand the new format, and I'm not even sure it was previously working fully.
Here's a basic query that shows there are two of the same articles with different IDs:
MariaDB [mydb]> select id,alias,title from db1_content where title = "article title";
+--------+---------------+--------------+
| id | alias | title |
+--------+---------------+--------------+
| 299959 | unique-title | Unique Title |
| 300026 | unique-title | Unique Title |
+--------+------------------------------+
Here's an attempt at trying to use "group by" but it returns no results.
MariaDB [mydb]> select id,title,count(title) from db1_content group by id,title having count(title) > 1;
Empty set (0.230 sec)
If I run the same query without the id field, then it does return a list of all titles that are duplicated, along with the number of occurrences of each title.
That's not exactly what I want, though. I need it to print the id, alias and title fields so I can reference them in my perl script to subsequently perform another query to ultimately delete the duplicates and create links to be used in RewriteRules.
What am I doing wrong?