I got stuck with my MySQL statement. I have a huge database table (140.000 rows) with file pathes, MD5 hashes and code versions. Now I want get rows that can identify versions based on MD5 hashes of a specific file.
Something like these
Path MD5 Version
----------------------------------------------------------------------------
my/path/to/file 12B9DA49A62D8E7B21B5AA2C6C3126CB 1.2.0
my/path/to/file 12B9DA49A62D8E7B21B5AA2C6C3126CB 1.2.1
my/path/to/file FDA9A7D31ECC70F99D533D88F9EDE441 1.2.2
my/path/to/file E68CC0233A1719A17CDF5D7CFB9539A8 1.2.3
my/path/to/file E68CC0233A1719A17CDF5D7CFB9539A8 1.2.4
So version 1.2.2 can get identifed easily by file and hash since this MD5 value for the file is unique.
Now I need the SQL Statement for that query. I have tried with DISTINCT, with HAVING COUNT with GROUP BY. But no success at all. Actually I am at the beginning again with
SELECT DISTINCT
md5
FROM
source_files
INNER JOIN (
SELECT
*
FROM
source_files
) subq ON source_files.md5 = subq.md5
since I need all columns and not only the md5 column which get returned by SELECT DISTINCT md5 FROM source_files
. But this also throws an error. Also I need to group by version
since I just need a single combination / version to identify it.
Multi-File Check
Also the next level would be: If a version could not get identified by a unique MD5 hash since there is no unqiue MD5 hash for that version it should get "tested" against two files, or three files, or four... Since there is an unqiuie combination of files / MD5 hashes for a version.
Here an updated example for multi file checking
Path MD5 Version
----------------------------------------------------------------------------
my/path/to/file 12B9DA49A62D8E7B21B5AA2C6C3126CB 1.2.0
my/path/to/file 12B9DA49A62D8E7B21B5AA2C6C3126CB 1.2.1
my/path/to/file FDA9A7D31ECC70F99D533D88F9EDE441 1.2.2
my/path/to/file E68CC0233A1719A17CDF5D7CFB9539A8 1.2.3 <- same file
my/path/to/file E68CC0233A1719A17CDF5D7CFB9539A8 1.2.4 <- same file
my/path/to/file2 884F676DE41CB35951322B98654E046C 1.2.0 <- same file
my/path/to/file2 884F676DE41CB35951322B98654E046C 1.2.1 <- same file
my/path/to/file2 884F676DE41CB35951322B98654E046C 1.2.2 <- same file
my/path/to/file2 884F676DE41CB35951322B98654E046C 1.2.3 <- same file
my/path/to/file2 3182B2BEDDB1F798F66D27425B9F99D9 1.2.4
my/path/to/file2 3182B2BEDDB1F798F66D27425B9F99D9 1.2.5
So Version 1.2.3 could get identified by 4th "file" and 4th "file2"