I will be adding comments into my website and I want to have a tree structure so each comment can have a parent. This creates a problem when retrieving comments because each comment would have to be traversed for children and this is unacceptable when it comes to database storage.
There are various solutions to this well known performance problem and I want to use simple, and fast, path prefix scan(SELECT * FROM comments WHERE path LIKE 'commentID/%'
) where I will have index with path to each comment via all its parents and the comment's id as value.
In practice it would look like:
[comment id]/[comment id]/[comment id]/.. = [comment id]
This way I can find all comments for a certain path at once and also delete children at once when needed.
The comment id will be unsigned integer, asigned by the databse as auto-incrementing key.
The problem I am looking at is how to have the shortest possible representation of the comment id so the path index will not be too long.
I expect to use unsigned 64bit long integer which takes up 8 bytes in binary format. In case of three predecessors, the path value wil be 24 bytes + 2 bytes for delimiter. If I would use a string representation, then it depends on the size of the numbers. In case each parent's id is 5 digits long(12345), the path value would be 17 bytes long and it would be more efficient than binary representation of the numbers by 9 bytes. But once the ids go above 8 digits, the string variant becomes less efficient, it is longer, than the binary format.
So my question is if there is another way to have short representation of an integer? I was thinking in line of bijective function.