dbt column level lineage#

This project computes the column lineage of the dbt testing project jaffle_shop thanks to SQLGlot.

dbt model#

As an example of the lineage, let’s take the model stg_orders and the output column customer_id:

with source as (

    select * from {{ ref('raw_orders') }}

),

renamed as (

    select
        id as order_id,
        user_id as customer_id,
        order_date,
        status

    from source

)

select * from renamed

In this model:

  • first CTE imports all columns from the source with a select *
  • column user_id is renamed to customer_id in a followinf CTE
  • finally, the last select is also a select * from the renaming CTE

➡️ column name is not explicit in the import, neither in the export, and column name output differs from input

Computed lineage#

The script has been run on the whole dbt project and produced its columns lineage as this json file.

This is the extract of the lineage for customer_id:

{
   "customer_id":{
      "intermediate_columns":[
         "customer_id",
         "renamed.customer_id",
         "source.user_id",
         "*"
      ],
      "ancestor_columns":[
         {
            "db":"main",
            "catalog":"jaffle_shop",
            "table":"raw_orders",
            "column":"user_id"
         }
      ]
   }
}

As expected, we get:

  • the final column output is customer_id
  • intermediate column names are *, user_id, and customer_id
  • the ancestor column is user_id, from the database main, dataset jaffle_shop, table raw_orders

Now time to traverse all child nodes using dbt table lineage 🔥

Test it on your own dbt manifest#

Source and instructions on github.