Recursion in SQL

Recursion if you've never heard the term is loosely defined as when a programming function calls itself. Conceptually recursion can be difficult to grasp, at least according to my classmates back when we were learning it in college. In most programming languages this isn't actually that difficult to do. However, every time I need to do this in SQL I always have to look up how I did it the last time to refresh my memory.

The first question people are going to ask is: When would I need this? Let's say for example you have a table with fields for a name, and a parent. Could be people, folders, menu items or anything else that has a tree structure. When building the data structure you don't want to have separate tables for root nodes, leaf level 1, leaf level 2, etc. Nor would you want to build queries that only look at specific levels of depth. The only time I might make an exception to this is if the software spec specifically states that there would only ever be one sublevel, but realistically, specs change often and future proofing it by allowing for multi-level drill-down may be the right path to take.

I'm going to call our example 'categories'. We can have root level categories and some number N of subcategories which could be directly under those root categories or other sub-categories. The total depth of the tree is arbitrary. This method could allow for a depth of 1, 100, or any number N although I suppose for very large datasets there could be some performance issues.

Let's first define our table and put some data in it:

CREATE TABLE [dbo].[Categories](
    [CAT_id] [int]  NOT NULL,
    [Description] [varchar](50) NOT NULL,
    [parent_id] [int] NULL
    )
go

Insert into categories (CAT_id,description,parent_id)
Values(1,'Root 1',NULL),
    (2,'root 2',NULL),
    (3, 'child 1-1',1),
    (4, 'root 3',NULL),
    (5, 'child 2-1',2),
    (6, 'child 1-2',1),
    (7, 'child 1-2-1',6),
    (8, 'child 1-1-1',3)

The first thing to note is that the order that items were entered should not matter when we are retrieving these.

If we do a select * we get the following:

CAT_id    Description    parent_id
1        Root 1        NULL
2        root 2        NULL
3        child 1-1    1
4        root 3        NULL
5        child 2-1    2
6        child 1-2    1
7        child 1-2-1    6
8        child 1-1-1    3

When we get our results it could be helpful to know the hierarchy level (picture the results as a hierarchical table of contents), and just for demonstration purposes, I'm going to build a string for the 'row' so we can see how things line up. For conceptual purposes, if this was a slide-in menu on a webpage, think of 'Row 0' as being THE web page itself, then we display row 0-1 first and row 0-1-1 would be next and indented and so on.

We will start by finding all the root level categories:

select cat_id, parent_id
,0 as hierarchy_level
, cast(concat(0,'-',row_number() over(order by cat_id)) as varchar(20)) as row
, description
from categories
where parent_id is null

Which results in:

cat_id    parent_id    hierarchy_level    row    description
1        NULL        0                0-1    Root 1
2        NULL        0                0-2    root 2
4        NULL        0                0-3    root 3

Now we need to add the recursion to return the remaining rows. To do this we begin the code by using the keyword 'WITH' and naming the temporary table 't1'. in SQL this is referred to as Common Table Expressions. The statement that now makes the query recursive is that we will perform a UNION between the query which gives us the root level with the lower levels, which is a join of the query we are defining and the original categories table. In other words, the definition of the query includes a call to itself which results in the query running multiple times to travel down the branches of the tree that we conceptually defined.

with t1 as (
select cat_id, parent_id
,0 as hierarchy_level
, cast(concat(0,'-',row_number() over(order by cat_id)) as varchar(20)) as row
, description
from categories
where parent_id is null

union all

select c.cat_id
, c.Parent_id
, hierarchy_level+1
,cast(concat(t1.row,'-',  row_number() over(order by c.cat_id))  as varchar(20)) as row
,c.description
from categories c inner join t1 on c.parent_id=t1.cat_id
) --ends t1

select * from t1 
order by row

After we define t1, we need a line of code to select its results and we want to order by the 'row' column to show the results in 'menu' order:

cat_id    parent_id    hierarchy_level    row        description
1        NULL        0                0-1        Root 1
3        1            1                0-1-1    child 1-1
8        3            2                0-1-1-1    child 1-1-1
6        1            1                0-1-2    child 1-2
7        6            2                0-1-2-1    child 1-2-1
2        NULL        0                0-2        root 2
5        2            1                0-2-1    child 2-1
4        NULL        0                0-3        root 3

Once we have the basic SQL, if we want to start traversing the tree deeper than the root level, all we need to do is change the starting point in the first half of the CTE. So if we wanted only the tree down from root 1 we would change the where clause to be parent_id = 1 instead of parent_id is null.

Additionally, if we wanted to short-circuit the system and only find at most 2 (or some other arbitrary number) deep we could modify the second part of the t1 table to include a where clause looking for t1.hierarchy_level < N.

I'll conclude with a word of caution. Recursion is a looping mechanism. That means that an improperly formatted recursive statement could cause an endless loop. While I haven't experienced this when programming SQL, I have had it happen when traversing a file system recursively and a symbolic link deep in the folder structure sent me back up the tree accidentally.