0

I'm trying to write data to a MySql database via c #. The problem is that all columns together should be unique but are too long for a key. To check for duplicates, I currently have a select query to check whether the record already exists and that it will not be added via an insert query. Now the script runs once a day and checks several million records if they are already there and if not they will be added.

example: At 2 million, that would be 2 million select querys over x million rows and again insert querys for all those who are not duplicates.

Is there a better way to check for duplicates? As I said unique, only all columns are together. The Select needs longer and longer the more lines there are ...

foreach (var item in list)
{
    string query = "SELECT id FROM dirs WHERE Directory = \"" + item.dir + "\" AND IdentityReference = \"" + item.IdentityReference + "\" AND AccessControlType = \"" + item.AccessControlType + "\" AND FileSystemRights = \"" + item.FileSystemRights + "\" AND IsInherited = " + item.IsInherited.ToString();

    MySqlCommand commanDatabase = new MySqlCommand(query, databaseConnection);
    MySqlDataReader dataReader = commanDatabase.ExecuteReader();

    bool duplicate = false;
    while (dataReader.Read())
    {
        duplicate = true;
        break;
    }

    dataReader.Close();

    if (!duplicate)
    {
        query = "INSERT INTO dirs (Directory, IdentityReference, AccessControlType, FileSystemRights, IsInherited) VALUES ";
        query += "(\"" + item.dir + "\", \"" + item.IdentityReference + "\", \"" + item.AccessControlType + "\", \"" + item.FileSystemRights + "\", " + item.IsInherited.ToString() + ")";

        commanDatabase = new MySqlCommand(query, databaseConnection);
        commanDatabase.CommandTimeout = 60;
        commanDatabase.ExecuteNonQuery();
    }
}

2 Answers2

1

you can use a virtual column that generates you automatic an MD5 checksum from your ROW and store is in a new field with unique index.

Note: There is a little difference in the syntax between MySQL and MariaDB.

sample

youre Table

CREATE TABLE `mytable` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `Directory` varchar(128) DEFAULT NULL,
  `IdentityReference` varchar(128) DEFAULT NULL,
  `AccessControlType` varchar(128) DEFAULT NULL,
  `FileSystemRights` varchar(128) DEFAULT NULL,
  `IsInherited` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

add 2 records

MariaDB [chksum]> INSERT INTO `mytable` (`id`, `Directory`, `IdentityReference`, `AccessControlType`, `FileSystemRights`, `IsInherited`)
    ->     VALUES
    ->         (1, '/usr/local', 'yes', 'rwxrwxrwx', 'NO', 7),
    ->         (2, '/etc/var/log', 'yes', 'xxxxx', 'YES', 99);
Query OK, 2 rows affected (0.003 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [chksum]> select * from mytable;
+----+--------------+-------------------+-------------------+------------------+-------------+
| id | Directory    | IdentityReference | AccessControlType | FileSystemRights | IsInherited |
+----+--------------+-------------------+-------------------+------------------+-------------+
|  1 | /usr/local   | yes               | rwxrwxrwx         | NO               |           7 |
|  2 | /etc/var/log | yes               | xxxxx             | YES              |          99 |
+----+--------------+-------------------+-------------------+------------------+-------------+
2 rows in set (0.000 sec)

MariaDB [chksum]>

Alter your Table, add new field

MariaDB [chksum]> ALTER TABLE `mytable`
    -> ADD COLUMN
    -> `chksum` VARCHAR(32) AS (
    ->     MD5(
    ->         CONCAT(    `DIRECTORY`,'-',
    ->                 `IdentityReference`,'-',
    ->                 `AccessControlType`,'-',
    ->                 `FileSystemRights` ,'-',
    ->                 `IsInherited`
    ->             )
    ->         )
    -> )  PERSISTENT ,
    -> ADD  UNIQUE KEY `idx_chksum` (`chksum`);
Query OK, 2 rows affected (0.029 sec)
Records: 2  Duplicates: 0  Warnings: 0

see content now with new field

MariaDB [chksum]> select * from mytable;
+----+--------------+-------------------+-------------------+------------------+-------------+----------------------------------+
| id | Directory    | IdentityReference | AccessControlType | FileSystemRights | IsInherited | chksum                           |
+----+--------------+-------------------+-------------------+------------------+-------------+----------------------------------+
|  1 | /usr/local   | yes               | rwxrwxrwx         | NO               |           7 | 5b812d391703099e6fea16f8a590891e |
|  2 | /etc/var/log | yes               | xxxxx             | YES              |          99 | 99a5e08acb58b23618bdb9f40737e5cf |
+----+--------------+-------------------+-------------------+------------------+-------------+----------------------------------+
2 rows in set (0.002 sec)

MariaDB [chksum]>

insert a row without an with duplicate key

MariaDB [chksum]> INSERT INTO `mytable` (`id`, `Directory`, `IdentityReference`, `AccessControlType`, `FileSystemRights`, `IsInherited`)
    -> VALUES
    ->     (NULL, '/usr/local/XXX', 'yes', 'rwxrwxrwx', 'NO', 7);
Query OK, 1 row affected (0.002 sec)

MariaDB [chksum]> INSERT INTO `mytable` (`id`, `Directory`, `IdentityReference`, `AccessControlType`, `FileSystemRights`, `IsInherited`)
    -> VALUES
    ->     (NULL, '/usr/local', 'yes', 'rwxrwxrwx', 'NO', 7);
ERROR 1062 (23000): Duplicate entry '5b812d391703099e6fea16f8a590891e' for key 'idx_chksum'
MariaDB [chksum]>
Bernd Buffen
  • 14,525
  • 2
  • 24
  • 39
0

[suggestions]

  1. Use parameters.
  2. Why not use Insert if not exists ? (basically combining your first and 2nd queries together)
  3. Change your DB design: You could add a new field to your table which calculates hash values of your desired columns. Which then be used to check uniqueness.

Something like:

Select Sha1(concat_ws('',Directory, IdentityReference, AccessControlType, FileSystemRights, IsInherited)) As HashValue

Which can then be used to identify unique values easily.

`` OR Combining insert and duplicate checking query could look like this.

INSERT INTO dirs (Directory, IdentityReference, AccessControlType, FileSystemRights, IsInherited)
SELECT
    @Directory, @IdentityReference, @AccessControlType, @FileSystemRights, @IsInherited
FROM Dual
WHERE
    1 not in (SELECT 1 From dirs WHERE Directory = @Directory, IdentityReference = @IdentityReference, AccessControlType=@AccessControlType, FileSystemRights = @FileSystemRights );
Krish
  • 5,917
  • 2
  • 14
  • 35
  • `''` could give some false matches. Use `'\t'` or something else that is not in any of the strings. – Rick James Mar 03 '19 at 23:28
  • @RickJames What do you mean by false positives? – Krish Mar 03 '19 at 23:33
  • What if the rows were `('a','bb','cc')` and `('ab', 'bc', 'c)`. In both cases _your_ `CONCAT_WS` would generate `'abbcc'`, thereby thinking there is a dup. – Rick James Mar 03 '19 at 23:57
  • @RickJames ah I understand what you mean. but OP is trying to build a path so in any case whether ` 'c:\dir1' '\file1' or 'c' ':\dir1\file1' ` should give `c:\dir1\file1' (that's what I understood). Plus I used concat_ws to allow null fields. But OP can decide how unique his paths should be. Thanks for pointing this. – Krish Mar 04 '19 at 00:07