I have the following tables in an embedded H2 database:
ACE (1,655,953) | PARENT_CHILD (4,544,788) | FILE (328,584)
-------------------------------------------------------------
ID | ID | ID
MEMBER_ID | PARENT_FILE_ID | NAME
FILE_ID | CHILD_FILE_ID |
- ACE is an Access Control Entry of a FILE and has a foreign key to the FILE table and an index on MEMBER_ID
- PARENT_CHILD holds the relation between a file and ALL of its parents (not just the direct parent). It has two foreign keys to the FILE table.
- FILE holds information about a file
The application displays file/ace information in a Tree for a selected member. I need the PARENT_CHILD table to support lazy loading in the application Tree. I don't load all the ace/file information in memory, but I just query the applicable file ID's. Joining the PARENT_CHILD table gives me all the parents when a match with a deep nested file is found (the Tree must know what root element belongs to the matching files). I hold an array of file ID's in memory, which is used to build the application folder Tree. When the user starts expanding the tree I load additional information about the file and related ACE's. I don't want to display folders that don't have any matching ACE's in their content. BUT, if a deep nested folder/file has a match I need to render the full path in the Tree.
So I use this query:
select distinct parent_file_id from parent_child
inner join ace on parent_child.child_file_id = ace.file_id
where ace.member_id = 1;
This works pretty well for members that have about 40,000 matching ACE's, but when a member has many matches this query starts performing quite bad (20 seconds). I test using Squirrel (1536 MB java memory, Xmx) and the following connection string (cache size 768MB):
jdbc:h2://C:\H2DB;CACHE_SIZE=786432;QUERY_CACHE_SIZE=0;
The execution plan looks as follow:
SELECT DISTINCT
PARENT_CHILD.PARENT_ID
FROM ACE
/* ACE_MEMBER_ID_FK_INDEX_E: MEMBER_ID = 1 */
/* WHERE ACE.MEMBER_ID = 1
*/
/* scanCount: 456397 */
INNER JOIN PARENT_CHILD
/* PARENT_CHILD_FILE_ID_FILE_ID_FK_INDEX_E: CHILD_FILE_ID = ACE.FILE_ID */
/* scanCount: 6969581 */
WHERE (ACLITEM.MEMBER_ID = 1)
AND (PARENTFILE.FILE_ID = ACLITEM.FILE_ID)
/*
reads: 840206
*/
Number of results: 328,584. I assume some temp file on disk is causing the performance drop, because the join with PARENT_CHILD is generating many records. I already increased the MAX_MEMORY_ROWS to 10,000,000 and this made the query perform in 20 instead of 40 seconds.
Any ideas on how to improve this query?
The subquery on the member id takes 2.6 seconds to complete (without the 'reading result' time, 456,396 records). Which sounds also quite long to me, since it's just a query on the index:
select * from ACE where member_id = 1;
Perhaps I have to rethink the lazy loading mechanism of the application tree with folders/files. The problem is that before I can render the root elements of the tree I need to know if there are matching children files. I don't want to display folders that don't have any matching content for the selected user.
Thanks.
Edit: Perhaps I will just store all parent id's as comma separated String in the FILE record. Then I don't need the table PARENT_CHILD and in my Java app it only takes 0.8 seconds to split 500,000 Strings to 5,000,000 Ints (each String having 10 comma separated random Int values).