Ways to get hierarchical data from a table with a self joins
Nov 01, 20245 min read
1. Write multiple queries, and then merge the results.
This was actually happening with Prisma
2. Using recursive CTEs
WITH RECURSIVE ModuleHierarchy AS ( -- Base case: select root modules SELECT id, module_name, parent_id, icon, "path", display_order, CAST(id AS TEXT) AS recursive_path FROM hrm.modules WHERE parent_id IS null and modules.active_status = true UNION ALL -- Recursive case: join with ModuleHierarchy to find child modules SELECT m.id, m.module_name, m.parent_id, m.icon, m."path", m.display_order, CONCAT(h.recursive_path, '->', m.id) AS recursive_path FROM hrm.modules m INNER JOIN ModuleHierarchy h ON m.parent_id = h.id where m.active_status = true ) SELECT mh1.id, mh1.module_name, mh1.icon, mh1."path", mh1.display_order, COALESCE( JSON_AGG( JSON_BUILD_OBJECT( 'id', mh2.id, 'module_name', mh2.module_name, 'icon', mh2.icon, 'path', mh2."path", 'child_modules', COALESCE( ( SELECT JSON_AGG( JSON_BUILD_OBJECT( 'id', mh3.id, 'module_name', mh3.module_name, 'icon', mh3.icon, 'path', mh3."path" ) ) FROM ModuleHierarchy mh3 WHERE mh3.recursive_path LIKE CONCAT(mh2.recursive_path, '->%') ), '[]' ) ) ) FILTER (WHERE mh2.id IS NOT NULL), '[]' ) AS child_modules, mh1.recursive_path FROM ModuleHierarchy mh1 LEFT JOIN ModuleHierarchy mh2 ON mh2.parent_id = mh1.id WHERE mh1.parent_id IS NULL GROUP BY mh1.id, mh1.module_name, mh1.icon, mh1.recursive_path, mh1."path", mh1.display_order order by mh1.display_order, mh1.module_name
Its a good fit when the data is very dynamic
If data is not frequently changed, can add materialized paths - basically save the path hierarchy in a column and then use that.
CTEs - Common table expressions - it creates a temporary kind of table on which complex queries can be performed.
Recursive CTEs - For this you give a base or root query, and then recursive query which will be called with the old result, something like a reducer in JavaScript
There are multiple strategies to create the recursion, like path, sets, etc. having tradeoffs
3. Using Nested queries
SELECT m.id as id, m.module_name, m.icon, m."path", m.display_order, COALESCE( JSON_AGG( JSON_BUILD_OBJECT( 'id', sub.id, 'module_name', sub.module_name, 'icon', sub.icon, 'path', sub."path", 'child_modules', COALESCE( ( SELECT JSON_AGG( JSON_BUILD_OBJECT( 'id', sub2.id, 'module_name', sub2.module_name, 'icon', sub2.icon, 'path', sub2."path" ) ) from modules sub2 where sub2.parent_id = sub.id group by sub2.display_order order by sub2.display_order ), '[]' ) ) ) FILTER (WHERE sub.id IS NOT NULL), '[]' ) AS child_modulesFROM modules mjoin modules sub on m.id = sub.parent_idwhere m.parent_id is nullGROUP BY m.parent_id, m.id, m.module_name, m.icon, m."path", m.display_order, sub.display_orderorder by m.display_order, sub.display_order
No overhead of creating multiple joins for CTEs
Becomes slower and cumbersome to manage with adding more nested levels