3

Assume the following table records:

TABLE: foo
==========================
| foo_id | foo_parent_id |
==========================
| 1      | NULL          |
| 2      | NULL          |
| 3      | 1             |
| 4      | 2             |
| 5      | 1             |
| 6      | 1             |
| 7      | 2             |
| 8      | 1             |
| 9      | NULL          |
--------------------------

I want to get, say, the first 10 parent records (those records with foo_parent_id = NULL) immediately followed by, say, the first 2 child records of that parent record. So, I'm looking for a result like this:

1, NULL
3, 1
5, 1
2, NULL
4, 2
7, 2
9, NULL

How do I query something like this?

StackOverflowNewbie
  • 39,403
  • 111
  • 277
  • 441
  • See also: [What are the Options for Storing Hierarchical Data in a Relational Database?](http://stackoverflow.com/questions/4048151/). – outis Dec 26 '11 at 07:59

2 Answers2

3

Here's one idea. But it's based on lots of assumptions about the way your data is setup. Ever increasing IDs down the tree, only two levels, etc.

SELECT f.foo_id,f.foo_parent_id FROM foo f
foo f

--give me the top X number of parent_ids (This is good, you just adjust the LIMIT 10 to vary the number of parent levels to show)

INNER JOIN 
(select foo_id from foo where foo_parent_id is null order by foo_parent_id 
LIMIT 10
) top_foo_parent
      on isnull(f.foo_parent_id,f.foo_id) = top_foo_parent.foo_id
WHERE

(This part is kind of hacky, as you have to put an ever longer string of these to get past two children)

--it's the first child, or...

(f.foo_id in (select MIN(foo_id) from foo fc1 where fc1.foo_parent_id =f.foo_parent_id)
 )
 or

--it's the second child, or...

(f.foo_id in (select MIN(foo_id) from foo fc1 where fc1.foo_parent_id =f.foo_parent_id  and fc1.foo_id not in (select MIN(foo_id) from foo fc2 where fc2.foo_parent_id=f.foo_parent_id))
 )
 or 

--it's the parent

 f.foo_parent_id is null
order by isnull(f.foo_parent_id,f.foo_id)*100 + f.foo_id

So what we're doing here is basically ordering by the parent_id column and then the child columns underneath it with a slight twist. If the parentid column is NULL then we use the actual ID. This means that for ordering purposes our table looks like this:

==============================================================================
| foo_id | foo_parent_id |   isnull(f.foo_parent_id,f.foo_id)
==============================================================================
| 1      | NULL           |         (1)
| 2      | NULL           |         (2)
| 3      |  1             |         1
| 4      |  2             |         2
| 5      |  1             |         1
| 7      |  2             |         2
----------------------------------------------------------------------

Then we multiply that ordering column *100

==============================================================================
| foo_id | foo_parent_id |   isnull(f.foo_parent_id,f.foo_id)*100
==============================================================================
| 1      | NULL           |         100
| 2      | NULL           |         200
| 3      |  1             |         100
| 4      |  2             |         200
| 5      |  1             |         100
| 7      |  2             |         200
----------------------------------------------------------------------

and lastly we add our foo_id column to it

==============================================================================
| foo_id | foo_parent_id |   isnull(f.foo_parent_id,f.foo_id)*100 + foo_id
==============================================================================
| 1      | NULL           |         101
| 2      | NULL           |         202
| 3      |  1             |         103
| 4      |  2             |         204
| 5      |  1             |         105
| 7      |  2             |         207
----------------------------------------------------------------------

Now we order the table by that virtual column and...

==============================================================================
| foo_id | foo_parent_id |   ORDER BY isnull(f.foo_parent_id,f.foo_id)*100 + foo_id
==============================================================================
| 1      | NULL           |         101
| 3      |  1             |         103
| 5      |  1             |         105
| 2      | NULL           |         202    
| 4      |  2             |         204
| 7      |  2             |         207
----------------------------------------------------------------------

There we go!

TetonSig
  • 2,189
  • 15
  • 21
0

I'd like to offer another approach at answering this question:

SET @topN = 1;
SELECT foo_id, foo_parent_id
FROM (
    SELECT 
          f.foo_id
        , f.foo_parent_id
        , IFNULL(f.foo_parent_id, f.foo_id) AS grp_rank
        , CASE 
            WHEN f.foo_parent_id IS NULL 
                THEN @topN:= 1 
            ELSE 
                @topN:=@topN+1 
            END topN_rank
        , f_parent.foo_id AS f_parent_foo_id
    FROM 
        foo AS f
        RIGHT JOIN (
            SELECT foo_id 
            FROM foo 
            WHERE foo_parent_id IS NULL 
            ORDER BY foo_id LIMIT 10
        ) f_parent
        ON f_parent.foo_id = IFNULL(f.foo_parent_id, f.foo_id)
    ORDER BY grp_rank, f.foo_id
) AS foo_derived
WHERE topN_rank <= 3

General Notes:

I want to get, say, the first 10 parent records <-- SUBQUERY "f_parent"

followed by, say, the first 2 child records of that parent record. <-- topN <= 3 (PARENT + 2 CHILDREN)

How it works:

  1. Create an extra column for foo_id to foo_parent_id integration grp_rank
  2. Join to a subquery pulling only the top 10 parents (assuming the order is by their foo_id values)
  3. Remember to sort by that extra column followed by foo_id for 'parent..followed by..first two children'
  4. Create another column than ranks the rows that will be sorted resetting via foo_parent encounters. NULL foo_parent_id resets the ranking. This new column topN_rank is impacted by the sorting.
  5. Nest that query to be able to derive topN_rank and to return only the needed columns.
  6. Filter by topN_rank column to get the top N rows for each group, prohibiting more than 3 per group.
Nonym
  • 6,199
  • 1
  • 25
  • 21