35

I have a prefix trie. What is the recommended schema for representing this structure in a relational database? I need substring matching to remain efficient.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
  • 1
    Yes, trie not tree. See http://en.wikipedia.org/wiki/Trie – dkretz Dec 10 '08 at 04:30
  • Are you storing and retrieving the trie to/from DB to be used in your code ? Because for DB lookup there are built-in tools like full-text indexing (based on similar principles) – Dmitry Khalatov Dec 10 '08 at 05:51

3 Answers3

20

How about the Materialized Path design?

CREATE TABLE trie (
  path VARCHAR(<maxdepth>) PRIMARY KEY,
  ...other attributes of a tree node...
);

To store a word like "stackoverflow":

INSERT INTO trie (path) VALUES
  ('s'), ('st'), ('sta'), ('stac'), ('stack'),
  ('stacko'), ('stackov'), ('stackove'), ('stackover'),
  ('stackover'), ('stackoverf'), ('stackoverflo'),
  ('stackoverflow');

The materialized path in the tree is the prefixed sequence of characters itself. This also forms the primary key. The size of the varchar column is the maximum depth of trie you want to store.

I can't think of anything more simple and straightforward than that, and it preserves efficient string storage and searching.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    The link redirects to nothing of interest. Here's an archived version: http://web.archive.org/web/20071019044908/http://www.dbazine.com/oracle/or-articles/tropashko4 – Howie Jun 26 '14 at 10:29
  • 1
    @Howie, thanks, I answered this 5.5 years ago, so it's not a surprise that some links go stale. – Bill Karwin Jun 26 '14 at 14:31
  • Can you give an example of how will you query this table for say "st" and there are more words like "stackoverflowone" – zengr Apr 09 '17 at 19:13
  • `SELECT path FROM trie WHERE path='st'`. If this returns a non-empty set, then "st" is in the Trie. I don't know what you're talking about on the second point. – Bill Karwin Apr 09 '17 at 21:25
  • 1
    I feel like the point of the trie is to find all words starting with st, but this only would return the one item 'st', not the strings starting with st. so is this answer not incomplete? would it be better to use like instead of `=`? – avatarofhope2 Aug 26 '22 at 22:40
  • @avatarofhope2, Correct, one would use `LIKE`. – Bill Karwin Aug 26 '22 at 23:14
0

Does any of your entities have a relationship with any other? If not, that is, not relational, a hash table with a serialization would do it.

yogman
  • 4,021
  • 4
  • 24
  • 27
  • 1
    That's really not a trie... I agree that it probably makes sense to keep it out of the db, but turning it into a hashtable? – dgtized Dec 10 '08 at 05:02
0

The above accepted answer to use the materialized path is awesome, but I think the table needs 2 more columns to be complete:

CREATE TABLE trie (
  path VARCHAR(<maxdepth>) PRIMARY KEY,
  **node_id NUMBER,**
  **parent_node_id NUMBER,**
  ...other attributes of a tree node...
);

The parent_node_id is a foreign key reference to the same table "trie", and references column "node_id". So suppose you search for the prefix "stack", if you have two matching suffix paths "overflow" and "underflow", you can find both of these child nodes using the query:

SELECT * FROM trie WHERE parent_node_id =
    (select node_id from trie where path='stack')

This will give us all the child nodes of a matching path and maintains the relations between the nodes.

Marc B
  • 53
  • 1
  • 5