2

I have 3 tables:

Molecule:
  id

Atom:
  id

MoleculeAtom: # Composite primary key
  molecule_id
  atom_id

My goal is to ensure that no combination of atoms which make up a molecule, are repeated. For example, the water molecule, I would store two rows in the MoleculeAtom table; 1 row for a hydrogen atom and 1 row for an oxygen atom. As you can see, I need to ensure that no other molecule has JUST hydrogen and oxygen, even though there may be other molecules which include hydrogen and oxygen.

At this point I have a query which identifies which molecules includes either hydrogen or oxygen, and only having 2 atoms in the MoleculeAtom table.

SELECT
  m.id, m.name, (SELECT count(*) from molecule_atom where molecule_id = m.id group by molecule_id) as atomCount
FROM
  molecule AS m
INNER JOIN
  molecule_atom AS ma ON ma.molecule_id = m.id
WHERE
  ma.atom_id IN (1,2)
HAVING atomCount = 2;

Which returns (demonstrative snippet):

+----+----------------------------+-----------+
| id | name                       | atomCount |
+----+----------------------------+-----------+
| 53 | Carbon Dioxide             |         2 |
| 56 | Carbon Monoxide            |         2 |
+----+----------------------------+-----------+

(I know, that both CO and CO2 have the same exact atoms, in differing quantities, but dis-regard that, as I am tracking the quantities as a another column in the same table.)

As of now I am pulling the above results and checking their atom_ids via PHP, which means I have to issue a separate query for each molecule, which seems inefficient, so I was looking to see if it's possible to do this checking using strictly SQL.

Excuse any mistakes which may be chemical related, it's been a long time since chem101.

Mike Purcell
  • 19,847
  • 10
  • 52
  • 89
  • You know that besides `H2O`, there's also `H2O2`, right? – ypercubeᵀᴹ Mar 16 '12 at 00:35
  • The molecule/atom schema abstracts what I am really working on which is why mentioned that I am not concerned with quantities, only with the principle of ensuring unique atoms across molecules. – Mike Purcell Mar 16 '12 at 00:43

4 Answers4

1

What you are asking for is a table-level constraint and these are not available in MySQL. In SQL-92 standard, there is ASSERTION, which is actually even more general (a constraint across more than 1 table). See the asnwers in this question: Why don't DBMS's support ASSERTION for details and for info about some products (MS-Access) that have such functionality with limitations.

In MySQL, you could try with a trigger to imitate such a constraint.


Update:

Firebird documentation says it allows subqueries in CHECK constraints.

Community
  • 1
  • 1
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • Thanks for the response, I'll read through the linked documentation. If I switch to MS-Access, someone needs to kick my dog. – Mike Purcell Mar 16 '12 at 01:02
  • You might want to post your 'Update' here as an answer to this question: [What SQL databases support subqueries in CHECK constraints](http://stackoverflow.com/questions/6195881/what-sql-databases-support-subqueries-in-check-constraints) – onedaywhen Mar 16 '12 at 10:12
0

As ypercube mentioned, MySQL doesn't support assertions, so I ended writing a query to find all molecules having at least one of the atoms which belong to the new molecule I am trying to create, and having the same number of atoms. After querying for matches, the application steps through each molecule and determines if they have the same exact atoms as the new molecule. Query looks like this (assumes I am trying to create a new molecule with 2 atoms):

SELECT 
    m.id,
    m.name,
    (SELECT GROUP_CONCAT(ma.atom_id) FROM molecule_atom AS ma WHERE ma.molecule_id = m.id GROUP BY ma.molecule_id HAVING (SELECT COUNT(ma.atom_id)) = 2) AS atoms
FROM
    molecule AS m
INNER JOIN
    molecule_atom AS mas ON mas.molecule_id = m.id
WHERE 
    mas.atom_id IN (1,2)

Then in code (PHP) I do:

foreach ($molecules as $molecule) {

    if (isset($molecule['atoms'])) {

        $diff = array_diff($newAtomIds, explode(',', $molecule['atoms']));

        // If there is no diff, then we have a match
        if (count($diff) === 0) {
            return $molecule['name'];
        }
    }
}

Thanks for everyone's response.

Mike Purcell
  • 19,847
  • 10
  • 52
  • 89
0

A unique index might be helpful on the molecule_atom table. That would prevent duplicates at that level. You're still going to need to do some checks via SQL statements. Another option depending on the size of your list would be to load it in memory in a hash table and then run the checks from there.

Lucas
  • 1
  • I should have mentioned it, but the molecule_id and atom_id columns of the MoleculeAtom table are a composite primary key, so that satisfies the unique constraint. – Mike Purcell Mar 16 '12 at 00:28
0

The idea here is to find pairs of molecules whose lists of atoms are not the same:

select m1.molecule_id as m1id, m2.molecule_id as m2id
from molecule_atom as m1, molecule_atom as m2,
    (select atom_id from molecule_atom as m where m.molecule_id=m1id) as m1a,
    (select atom_id from molecule_atom as m where m.molecule_id=m2id) as m2a,
where m1id < m2id and (((m1a - m2a) is not null) or ((m2a - m1a) is not null))
Scott Hunter
  • 48,888
  • 12
  • 60
  • 101