dbt column level lineage#
This project computes the column lineage of the dbt testing project jaffle_shop thanks to SQLGlot.
dbt model#
As an example of the lineage, let’s take the model stg_orders and the output column customer_id:
with source as (
select * from {{ ref('raw_orders') }}
),
renamed as (
select
id as order_id,
user_id as customer_id,
order_date,
status
from source
)
select * from renamed
In this model:
- first CTE imports all columns from the source with a
select * - column
user_idis renamed tocustomer_idin a followinf CTE - finally, the last select is also a
select *from the renaming CTE
➡️ column name is not explicit in the import, neither in the export, and column name output differs from input
Computed lineage#
The script has been run on the whole dbt project and produced its columns lineage as this json file.
This is the extract of the lineage for customer_id:
{
"customer_id":{
"intermediate_columns":[
"customer_id",
"renamed.customer_id",
"source.user_id",
"*"
],
"ancestor_columns":[
{
"db":"main",
"catalog":"jaffle_shop",
"table":"raw_orders",
"column":"user_id"
}
]
}
}
As expected, we get:
- the final column output is
customer_id - intermediate column names are
*,user_id, andcustomer_id - the ancestor column is
user_id, from the databasemain, datasetjaffle_shop, tableraw_orders
Now time to traverse all child nodes using dbt table lineage 🔥
Test it on your own dbt manifest#
Source and instructions on github.
Read other posts