So I have this project extending an existing codebase, and it's built on a MySQL database. I have a table whose tuples represent locations in a building, and I'm making a table representing edges connecting these locations in a graph for pathfinding. It's just a simple two-column table, each column holding an ID which is a foreign key referencing the locations table. So to summarize, the situation is like this:
CREATE TABLE node (
ID int(10) AUTO_INCREMENT NOT NULL,
(...)
PRIMARY KEY(ID)
);
CREATE TABLE edge(
ID_1 int(10),
ID_2 int(10),
FOREIGN KEY (ID_1) REFERENCES node(ID),
FOREIGN KEY (ID_2) REFERENCES node(ID)
);
It's an existing assumption that the graph is symmetric, i.e. if node A has an edge to node B, node B must have an edge to node A. The case of one-way doors is being ignored here but doesn't seem applicable to my domain. As such, given nodes A and B are connected, their connection could be stored in the edge table in any of three ways:
{(A,B)}
{(B,A)}
{(A,B),(B,A)}
The question I use this table to answer is "Given this node, what nodes are its neighbours?" and presently my application does that using a query like this:
SELECT * FROM node
WHERE ID IN(
SELECT ID_2
FROM edge
WHERE ID_1=?
UNION
SELECT ID_1
FROM edge
WHERE ID_2=?
);
However, my instinct was to use a CHECK constraint to ensure only the latter structure could be stored, which MySQL apparently doesn't support. From that answer, to emulate such a constraint I'd have to write triggers for every operation that keep every edge stored in both orderings. The advantage would be that it performs more as you'd naturally expect, and can answer the question with just
SELECT * FROM node
WHERE ID IN(
SELECT ID_2
FROM edge
WHERE ID_1=?
);
Once I read about triggers to emulate CHECKs I talked it over with a friend and he suggested that ensuring reflexive storage would be wasteful (doubling storage use) and it should be left to the application to treat the database correctly. Now I'm a little unsure what's actually the best solution - should the database ensure reflexivity by doubling inserts and deletes using triggers, or should the application just keep reading data not stored reflexively so that it looks like it is? Allowing my database to represent the same data in multiple ways worries me just a bit. Is that unreasonable? Does the UNION and such amount to any appreciable performance hit?
(Being aimed at reasonably small sites, this system is unlikely to get past tens of thousands of nodes, with typical nodes having at most 6 edges).