0

I have a table with a large number of columns. Say, the table has 1000 columns from 'col1' to 'col1000'.

For duplicate criteria I don't want to use 'col1' and 'col1000'. So, if in any row values for 'col2' to 'col999' have already occurred, they are duplicates.

I tried the solution given here http://www.besttechtools.com/articles/article/sql-query-to-delete-duplicate-rows but this requires me to explicitly write all the columns I am considering for duplicates. I can't write 998 columns.

Can somebody please help me with the query in deleting such duplicates?

I want to delete such duplicates in a multiple tables.

user3664020
  • 2,980
  • 6
  • 24
  • 45
  • 1
    You can write 998 columns, because that is required for what you want to do. Either a query using `information_schema.columns` and/or a spreadsheet might help you write the query faster. – Gordon Linoff Aug 05 '15 at 11:45
  • 1
    1000 columns will make your MySQL Table **CRY :(**. I wish you would have applied **Normalization** ! – Pratik Joshi Aug 05 '15 at 11:52
  • **requires me to explicitly write all the columns** it is bad luck, but you do have to do it. – Paul Maxwell Aug 05 '15 at 11:54
  • I'm not sure what GL is smoking, but any time you have (more than 2) enumerated columns you can be sure that your schema is denormalised. Look again at normalisation. – Strawberry Aug 05 '15 at 11:58
  • Do you want to restrict this to SQL? I'd use a scripting language, read `*` for the columns, flatten this array out to a string (e.g. `print_r($row, true)` in PHP) and then hash it with MD5. Any dups are then easily detected. It would be worth adding a PK to this table prior to doing this though, so rows are individually addressable. – halfer Aug 05 '15 at 16:28

1 Answers1

0

One thing is for sure - you will have to name all the columns for the query. Now one simple approach could be to use excel to do this task easily.

1) Describe the table, you would get all column names. 2) Put all column names into Excel and cleanup the data (using Text-to-Column feature along with find/replace).

3) Once you have the columns, just build you query. You can use excel to do all sort of specific manipulation around column in one-go (using select-drag).

4) Once your queries are ready, just dump that into a text file by select the excel data (of all the cells).

5) Make sure to replace tab (\t) characters with space into text file.

You should be good to go and execute.

Gyanendra Dwivedi
  • 5,511
  • 2
  • 27
  • 53