Testing SQL Query Results

Goal: Write tests for a query, when the query doesn't write to any tables.

6 Degrees of Kevin Bacon Recursive CTE

Setup

It's just another Monday at StreamBuster, the buzzy tech startup that's disrupting the streaming video market. At StreamBuster we believe that there's a billion dollar market for the game-changing convenience that comes from driving to a store, picking up a physical disk (if there's a good one in stock anyway) and driving home with it to enjoy with friends and family.Think of the thrill of the hunt, when a movie you want to watch is in stock. Think about how much accomplishment you'll feel about having "brought home the bacon" to an adoring spouse and children. Streaming is dead, people want StreamBuster stores.

Your boss is starting the day late again, it's 1:00PM and she's active 🟢 in Slack for the first time today. It's bad news when she's starting this late, it usualy means she was out late partying with the sales team- and is bloated with "really good ideas" for you to execute on. Oh no, here it comes...

> Sandy Holcombs is typing...
> Hey there!  Was out with the sales team last night, and Chad had a brilliant idea to drive more sales.  We're going to re-plan all of the stores, so that at the center of the store are all the movies featuring Kevin Bloom (I love his movies🍿).  Then movies featuring people he's co-stared with will be in the next row out, then people who were featured in movies with people who were featured in movies with Kevin will be in the row after that, and so on.  It will be so easy for customers to find the movie they're looking for once the stores are organized by Bacon-index.  This is going to be huge.  We've applied for a patent to make sure that those annoying public libraries don't copy it.
>
> I need you to make a list of every actor from every film, and include the Bacon number so that the store managers and staff can work over the holiday weekend to get their stores in shape.
>
> K, thx, bye.
            

You knew it. Better get to work. The last time Holcombs had a genius idea it was more work to talk her out of it than to just do the work. Anyway, this does sound like a fun assignment (for you anyway- pity to the poor store managers and their holiday plans). Since the stores can only hold four rows of movies, this will be "four degrees of Kevin Bloom."

See the SQL Testing Framework 101 tutorial for database setup instructions to follow along on your computer.

Get to work

The relevent parts of the database at StreamBuster are setup like this:

postgres=# \d actor
                                        Table "public.actor"
   Column    |            Type             | Nullable |                 Default
-------------+-----------------------------+----------+-----------------------------------------
 actor_id    | integer                     | not null | nextval('actor_actor_id_seq'::regclass)
 first_name  | character varying(45)       | not null |
 last_name   | character varying(45)       | not null |
 last_update | timestamp without time zone | not null | now()

postgres=# \d film_actor
                     Table "public.film_actor"
   Column    |            Type             | Nullable | Default
-------------+-----------------------------+----------+---------
 actor_id    | smallint                    | not null |
 film_id     | smallint                    | not null |
 last_update | timestamp without time zone | not null | now()
postgres=# \d film

                                      Table "public.film"
      Column      |            Type             | Nullable |                Default
------------------+-----------------------------+----------+---------------------------------------
 film_id          | integer                     | not null | nextval('film_film_id_seq'::regclass)
 title            | character varying(255)      | not null |
 description      | text                        |          |
 release_year     | year                        |          |
 length           | smallint                    |          |
 rating           | mpaa_rating                 |          | 'G'::mpaa_rating
 last_update      | timestamp without time zone | not null | now()

We can figure this out using the actor and film_actor tables, then just join-in the film table so that the store managers know the film title, but it's going to be a complex query, and require use to use a RECURSIVE common table expression (CTE)- which we will certainly get wrong at first and will be very hard to know if we're right. SQL Testing Framework will help us here!

In picture form, we're looking to get a result that looks something like this:

A monster query

After several hours, you come up with this monster of a query:

-- Find four degrees of Kevin Bloom
--
WITH RECURSIVE kevin_bloom AS (
  SELECT
     actor_id
    ,first_name
    ,last_name
  FROM actor
  WHERE first_name ILIKE 'KEVIN'
    AND last_name ILIKE 'BLOOM'
), degrees_of_kevin_bloom as (
  SELECT
     0 AS degree
    ,film_actor.film_id
    ,kevin_bloom.actor_id
  FROM kevin_bloom
  JOIN film_actor
    ON film_actor.actor_id = kevin_bloom.actor_id

  UNION

  SELECT
    CASE film_actor.actor_id
      WHEN dokb.actor_id THEN degree
      ELSE degree + 1
    END
    ,film_actor.film_id
    ,NULLIF(film_actor.actor_id, dokb.actor_id) AS actor_id
  FROM degrees_of_kevin_bloom AS dokb
  JOIN film_actor
    ON (
         film_actor.film_id  = dokb.film_id
      OR film_actor.actor_id = dokb.actor_id
    )
  WHERE degree < 4
), summary AS (
  SELECT
     MIN(degree) AS degree
    ,dokb.actor_id
  FROM degrees_of_kevin_bloom AS dokb
  WHERE dokb.actor_id IS NOT NULL
  GROUP BY dokb.actor_id
)

SELECT
  summary.degree
  ,actor.first_name
  ,actor.last_name
  ,actor.actor_id
FROM summary
JOIN actor
  ON actor.actor_id = summary.actor_id
ORDER BY degree, actor.first_name, actor.last_name
;

Because this query is a pure function, without side effects, the testing techniques we used in the first tutorial won't work here. We'll need to use a few more features of SQL Testing Framework. Our new tests are going to follow this form:

\set query /queries/degrees-of-kevin-bloom.sql
:setup_test
WITH text AS (
  SELECT 'Some assertion about the query result' AS value
), expect AS (
  -- SELECT something_we_expect AS value
), actual AS (
  -- SELECT something AS value
  -- FROM "/queries/degrees-of-kevin-bloom.sql"
)
:evaluate_test
:cleanup_test

Let's break it down:

\set query /queries/degrees-of-kevin-bloom.sql

With this line we are telling SQL Testing Framework the name of the file that contains our query-under-test.

:setup_test
WITH text AS (
  SELECT 'Some assertion about the query result' AS value
), expect AS (
  -- SELECT something_we_expect AS value
)

This next part is exactly the same as first tutorial, Writing SQL Tests

, actual AS (
  -- SELECT something AS value
  -- FROM "/queries/degrees-of-kevin-bloom.sql"
)

SQL Testing Framework will automatically create an ephemeral table for us that stores the result (while the tests are running). The table has the same name as the filename of the query-under-test, so we can reference it as FROM "/queries/degrees-of-kevin-bloom.sql".

:evaluate_test
:cleanup_test

Finally, we tell SQL Testing Framework to run the tests and clean up the ephemeral table and variables.

Writing the tests

Let's start with a simple sanity check. For our purpose, actors can only be a single n-degree connection. So let's make sure that the actor_id can only appear once in the query result:

\set query /queries/degrees-of-kevin-bloom.sql
:setup_test
WITH text AS (
  SELECT 'Each actor can only be in a single "degree" group' AS value
), expect AS (
  SELECT
    COUNT(actor_id) AS value
  FROM "/queries/degrees-of-kevin-bloom.sql"
), actual AS (
  SELECT
    COUNT(DISTINCT actor_id) AS value
  FROM "/queries/degrees-of-kevin-bloom.sql"
)
:evaluate_test
:cleanup_test

This an encouraging result, our query doesn't produce any duplicate actors.

 actual     |    expect     | did_pass |                        text
---------------+---------------+----------+-----------------------------------------------------
 200           | 200           | t        | Each actor can only be in a single "degree" group
(1 row)

Now we can move on to a more complicated verification. We'll check that the first-degree connections are all right.

\set query /queries/degrees-of-kevin-bloom.sql
:setup_test
WITH text AS (
  SELECT 'The first degree actors are correct' AS value
), kevin_bloom AS (
  SELECT
     actor_id
  FROM actor
  WHERE first_name ILIKE 'KEVIN'
    AND last_name ILIKE 'BLOOM'
), kevin_bloom_films AS (
  SELECT
    film_actor.film_id
  FROM film_actor
  JOIN kevin_bloom
    ON kevin_bloom.actor_id = film_actor.actor_id
), first_degree_actors AS (
  SELECT DISTINCT ON (film_actor.actor_id)
    film_actor.film_id
    ,film_actor.actor_id
    ,actor.first_name
    ,actor.last_name
  FROM film_actor
  JOIN kevin_bloom_films
    ON kevin_bloom_films.film_id = film_actor.film_id
  JOIN actor
    ON actor.actor_id = film_actor.actor_id
  LEFT OUTER JOIN kevin_bloom -- need to remove Kevin himself.
    ON kevin_bloom.actor_id = film_actor.actor_id
  WHERE kevin_bloom.actor_id IS NULL
  ORDER BY film_actor.actor_id
), expect AS (
  SELECT
    -- bunch up all the actor_ids for comparison
    ARRAY_AGG(actor_id::integer ORDER BY actor_id) AS value
  FROM first_degree_actors
), actual AS (
  SELECT
    ARRAY_AGG(actor_id::integer ORDER BY actor_id) AS value
  FROM "/queries/degrees-of-kevin-bloom.sql"
  WHERE degree = 1
)
:evaluate_test
:cleanup_test

This gives us a good feeling about the first-degree connections:

 actual     |    expect     | did_pass |                        text
---------------+---------------+----------+-----------------------------------------------------
 200           | 200           | t        | Each actor can only be in a single "degree" group
 {2,5,8…8,200} | {2,5,8…8,200} | t        | The first degree actors are correct
(2 rows)
If the test is passing the SQL Testing Framework automatically truncates the actual and expect displays. If the test had failed, we would see the full results of both.

Second-degree connections are anyone who was in a movie with a first-degree connection (assuming they themselves are not a first degree connection). Verifying the second-degree actors is just another iteration of the first degree test:

\set query /queries/degrees-of-kevin-bloom.sql
:setup_test
WITH text AS (
  SELECT 'The second degree actors are correct' AS value
), kevin_bloom AS (
  SELECT
     actor_id
  FROM actor
  WHERE first_name ILIKE 'KEVIN'
    AND last_name ILIKE 'BLOOM'
), kevin_bloom_films AS (
  SELECT
    film_actor.film_id
  FROM film_actor
  JOIN kevin_bloom
    ON kevin_bloom.actor_id = film_actor.actor_id
), first_degree_actors AS (
  SELECT DISTINCT ON (film_actor.actor_id)
    film_actor.film_id
    ,film_actor.actor_id
    ,actor.first_name
    ,actor.last_name
  FROM film_actor
  JOIN kevin_bloom_films
    ON kevin_bloom_films.film_id = film_actor.film_id
  JOIN actor
    ON actor.actor_id = film_actor.actor_id
  LEFT OUTER JOIN kevin_bloom -- need to remove Kevin himself.
    ON kevin_bloom.actor_id = film_actor.actor_id
  WHERE kevin_bloom.actor_id IS NULL
  ORDER BY film_actor.actor_id
-- NEW STUFF STARTS HERE --
), first_degree_films AS (
  SELECT
    film_actor.film_id
  FROM film_actor
  JOIN first_degree_actors
    ON first_degree_actors.actor_id = film_actor.actor_id
), second_degree_actors AS (
  SELECT DISTINCT ON (film_actor.actor_id)
    film_actor.film_id
    ,film_actor.actor_id
    ,actor.first_name
    ,actor.last_name
  FROM film_actor
  JOIN first_degree_films
    ON first_degree_films.film_id = film_actor.film_id
  JOIN actor
    ON actor.actor_id = film_actor.actor_id
  LEFT OUTER JOIN kevin_bloom -- need to remove Kevin himself.
    ON kevin_bloom.actor_id = film_actor.actor_id
  LEFT OUTER JOIN first_degree_actors -- need to remove first-degree actors
    ON first_degree_actors.actor_id = film_actor.actor_id
  WHERE kevin_bloom.actor_id IS NULL
    AND first_degree_actors.actor_id IS NULL
  ORDER BY film_actor.actor_id
), expect AS (
  SELECT
    -- bunch up all the actor_ids for comparison
    ARRAY_AGG(actor_id::integer ORDER BY actor_id) AS value
  FROM second_degree_actors
), actual AS (
  SELECT
    ARRAY_AGG(actor_id::integer ORDER BY actor_id) AS value
  FROM "/queries/degrees-of-kevin-bloom.sql"
  WHERE degree = 2
)
:evaluate_test
:cleanup_test

Now we can see the test result for the second degree actors is also correct

 actual     |    expect     | did_pass |                        text
---------------+---------------+----------+-----------------------------------------------------
 200           | 200           | t        | Each actor can only be in a single "degree" group
 {2,5,8…8,200} | {2,5,8…8,200} | t        | The first degree actors are correct
 {1,3,4…7,199} | {1,3,4…7,199} | t        | The second degree actors are correct
(3 rows)

From here there are many more things we could verify and we may want to optimize the query- but now if we change the query we can simply run the tests again to verify that it's still correct. SQL Testing Framework has given us a simple workflow to check a complex query, that would have required extensive manual verification!

The code for this example is available on Github

Up next: Precheck assertions help you write better tests. Learn how