-1

I have used web scrapping to store multiple rows of data in my sql database. Now after my job was done, some of the rows contain duplicate values. Data is displayed like

Date           |    Topic Name
01-01-2023     |    Topic A
01-02-2023     |    Topic B
01-02-2023     |    Topic B
01-03-2023     |    Topic B    

My Result should look like

Date           |    Topic Name
01-01-2023     |    Topic A
01-02-2023     |    Topic B
01-03-2023     |    Topic B

I want to Delete only those rows only if both columns have duplicate values.

I have tried to use recursive method to delete rows but somewhere i am not able to achieve my task. Kindly guide me to solve this problem.

Khan
  • 11
  • 4
  • 1
    Are you using MySQL or MS SQL Server? – jarlh Aug 04 '23 at 18:05
  • I've removed the conflicting tags (and [[tag:database]] as the question isn't about them). You're need to [edit] your question to retag appropriately. – Thom A Aug 04 '23 at 18:07
  • Simple solution: SELECT DISTINCT into a new table. Drop old, rename new. – jarlh Aug 04 '23 at 18:07
  • To add onto what @jarlh says, once you copy the table you should add a unique constraint containing both columns to make sure no duplicates get added in the future – GrumpyCrouton Aug 04 '23 at 18:07
  • 1
    The best way to prevent duplicates is to make use of your database's native UNIQUE constraint feature, so that such rows don't get inserted in the first place. Otherwise, you're just going to keep getting more duplicates. – Alex Howansky Aug 04 '23 at 18:08
  • yeah, as Alex says using a unique constraint to prevent duplicate would be best, second best would be to create a second identical table, add the unique constraint over date and topic name, then insert from one to the other skipping duplicates. – Félix Adriyel Gagnon-Grenier Aug 04 '23 at 18:35
  • @Luuk that method of duplicate rows is not working. Last option is to create identical table with unique constraint. But there should be some sort of code down the line other than creating a new table. – Khan Aug 04 '23 at 18:52
  • You did not read the description of the tag [tag:sql], so I will show you a solution in sql-server: https://dbfiddle.uk/79AaHiWh – Luuk Aug 04 '23 at 19:29

1 Answers1

-2

If I understood right and you want to remove these dublicate lines from the table using php and mySQL, then I have an answer for you.

Assumption: Let's assume your table is named your_table and has columns Date and TopicName.

Identify Duplicate Rows: You can use a subquery to identify the duplicate rows based on both columns.

Delete Duplicate Rows: Once you have identified the duplicate rows, you can use a DELETE statement to remove them from the table.

Here's the SQL script to achieve this:

WITH DuplicateRows AS (
    SELECT Date, TopicName
    FROM your_table
    GROUP BY Date, TopicName
    HAVING COUNT(*) > 1
)
-- Delete duplicate rows
DELETE FROM your_table
WHERE (Date, TopicName) IN (SELECT Date, TopicName FROM DuplicateRows);

In this script:

The DuplicateRows common table expression (CTE) is used to identify the rows that have duplicate values in both the Date and TopicName columns.

The main DELETE statement then removes rows from the table where the combination of Date and TopicName matches the values in the DuplicateRows CTE.

Please be cautious when using DELETE statements, as they can permanently remove data from your database. It's recommended to take a backup of your data before performing such operations.

Remember to replace your_table with the actual name of your table in the script. Always test your queries on a smaller dataset or a backup before applying them to your production data.

  • 2
    This is not a good solution, because when you DELETE, you will be deleting all copies of the same row, in the example all rows for 01-02-2023 and Topic B will be removed, but you should keep one – nacho Aug 04 '23 at 18:30