Snowflake Joins

Paste and edit existing SQL Select statements

The tool allows you to quickly understand the structure and possible problems with a syntactically correct SQL which produces ex a cartesian join. The SQL will be parsed, tables with aliases, columns, joins, WHERE clauses, subselects, GROUP BY will be extracted and renderred into a diagram. The uniqueness of this site is that parsing requires no access to your physical database, or to it's create script. This also means that you have to qualify your columns with a table or alias prefix.

This approach was introduced by the open source Reverse Snowflake Joins project in August 2008, with an online demo on this website since 2008. The tables are layed out automatically using Graphviz.

Cartesians joins, loops and typos

The query bellow has a typo, S.Geography_Id = G.Id instead of S.Geography_Id = C.Id. How fast can you spot the error in the unformatted SQL?

SELECT B.Brand, G.Country, SUM (F.Units_Sold) FROM Fact_Sales F INNER JOIN Dim_Date D ON F.Date_Id = D.Id INNER JOIN Dim_Store S ON F.Store_Id = S.Id INNER JOIN Dim_Geography G ON S.Geography_Id = C.Id INNER JOIN Dim_Product P ON F.Product_Id = P.Id INNER JOIN Dim_Product_Category C ON P.Product_Category_Id = C.Id INNER JOIN Dim_Brand B ON P.Brand_Id = B.Id WHERE D.Year = 1997 AND C.Product_Category = 'tv' GROUP BY B.Brand, G.Country

Left and right joins

Left and right joins are easily spotted, the colored edge being wider towards the table containing also NULL values. For example the path from fact right join d2 left join d22 contains left and right joins, and most probably would return incorrect results

SELECT *
FROM fact LEFT JOIN d1 ON fact.k1=d1.k1 RIGHT JOIN d11 ON d11.k11=d1.k11
RIGHT JOIN d2 ON fact.k2=d2.k2 LEFT JOIN d22 ON d22.k22=d2.k22
LEFT JOIN d3 ON fact.k3=d3.k3 RIGHT JOIN d33 ON d33.k33=d3.k33

Chasm traps

Chasm traps occur when one table is connected to two or more tables by 1-to-N relationships. This causes multiplication of rows. For example one bad way is to directly try to compare actual sales against sales targets by products.

SELECT
p.product, SUM(o.sales), SUM(sp.sales)
FROM
products p INNER JOIN orders o ON p.pid = o.pid
INNER JOIN salesplans sp on p.pid = sp.pid
GROUP BY p.pid

Fan traps

Fan traps are caused when joining one table by 1-to-N relationship to a second table that is joined to a third table by another 1-to-N relationship. This again is causing multiplication of rows. For example taking the sum of amounts from the orders table and the count of shipped items from the lineitem table.

SELECT
c.customer, SUM(o.ordertotal), SUM(l.quantity)
FROM customer c INNER JOIN orders o ON c.cid = o.cid
INNER JOIN orderlines l ON o.oid = l.oid
GROUP BY c.customer;

Write a SELECT to quickly document or design database schemas

Following example is taken from the tables from Fossil SCM, plus the inexistent Payments table and a few comments. Easy to write SQL syntax, you can add whatever comments you like and you are ready to show it to your coleagues ex using an overhead projector. Makes technical meetings more productive.

SELECT
payment.login, payment.timestamp, payment.amount, payment.status,
user.uid, user.login, user.pw, user.cap, user.cookie, user.cexpire, user.info, user.mtime, user.photo,
ticketchng.tkt_id, ticketchng.rkt_rid, ticketchng.mtime, ticketchng.login, ticketchng.username, ticketchng.mimetype, ticketchng.icomment
FROM
user INER JOIN payment ON user.login = payment.login INNER JOIN ticketchng ON ticketchng.login = user.login
WHERE
user.photo='field not used' AND ticketchng.username='field not used'

Constraint optimizer might reoder the columns to minimize edge crossings

Sometimes the joining columns between tables have different alphabetical order in each table, leading to unnecessary edge crossigs. These can be minimised by re-ordering the columns in some tables.

SELECT
t1.a FROM t1 JOIN t2 ON t1.a=t2.z and t1.b=t2.y and t1.c=t2.x
JOIN t3 ON t2.x=t3.n and t2.y=t3.m
JOIN t4 ON t3.n=t4.i and t3.o = t4.j
JOIN t5 ON t3.o=t5.x