Skip to main content

Developer Blog | dbt Developer Hub

Find tutorials, product updates, and developer insights in the dbt Developer Blog.

Start here

· 5 min read
Andrew Escay

Without a command to run them, dbt models and tests are just taking up space in a Git repo.

The specific dbt commands you run in production are the control center for your project. They are the structure that defines your team’s data quality + freshness standards.

· 6 min read
Sanjana Sen
Jason Ganz
David Krevitt

We’ve all done it: fanned out data during a join to produce duplicate records (sometimes duplicated in multiple).

That time when historical revenue numbers doubled on Monday? Classic fanout.

Could it have been avoided? Yes, very simply: by defining the uniqueness grain for a table with a primary key and enforcing it with a dbt test.

So let’s dive deep into: what primary keys are, which cloud analytics warehouses support them, and how you can test them in your warehouse to enforce uniqueness.

· 7 min read
Sanjana Sen
Jason Ganz
David Krevitt

Why primary keys are important

We all know one of the most fundamental rules in data is that every table should have a primary key. Primary keys are critical for many reasons:

  • They ensure that you don’t have duplicate rows in your table
  • They help establish relationships to other tables
  • They allow you to quickly identify the grain of the table (ex: the customers table with a PK of customer_id has one row per customer)
  • You can test them in dbt, to ensure that your data is complete and unique

· 9 min read
Joel Labes

As we get closer to dbt v1.0 shipping in December, it's a perfect time to get your installation up to scratch. dbt 1.0 represents the culmination of over five years of development and refinement to the analytics engineering experience - smoothing off sharp edges, speeding up workflows and enabling whole new classes of work.

Even with all the new shinies on offer, upgrading can be daunting – you rely on dbt to power your analytics workflow and can’t afford to change things just to discover that your daily run doesn’t work anymore. I’ve been there. This is the checklist I wish I had when I owned my last company’s dbt project.

· 11 min read
Amy Chen

Packages are the easiest way for a dbt user to contribute code to the dbt community. This is a belief that I hold close as someone who is a contributor to packages and has helped many partners create their own during my time here at dbt Labs.

The reason is simple: packages, as an inherent part of dbt, follow our principle of being built by and for analytics engineers. They’re easy to install, accessible and at the end of the day, it’s just SQL (with sprinklings of git and jinja). You can either share your package with the community or just use it among your teams at your org.

So I challenge you after reading this article to test out your skillsets, think about the code that you find yourself reusing again and again, and build a package. Packages can be as complex as you would want; it’s just SQL hidden in the mix of reusable macros and expansive testing frameworks. So let’s get started on your journey.

· 3 min read
Jason Ganz
David Krevitt

Doing analytics is hard. Doing analytics right is even harder.

There are a massive number of factors to consider: Is data missing? How do we make this insight discoverable? Why is my database locked? Are we even asking the right questions?

Compounding this is the fact that analytics can sometimes feel like a lonely pursuit.

Sure, our data is generally proprietary and therefore we can’t talk much about it. But we certainly can share what we’ve learned about working with that data.

So let’s all commit to sharing our hard won knowledge with each other—and in doing so pave the path for the next generations of analytics practitioners.

· 13 min read
Sung Won Chung

Airflow and dbt are often framed as either / or:

You either build SQL transformations using Airflow’s SQL database operators (like SnowflakeOperator), or develop them in a dbt project.

You either orchestrate dbt models in Airflow, or you deploy them using dbt Cloud.

In my experience, these are false dichotomies, that sound great as hot takes but don’t really help us do our jobs as data people.

· 8 min read
Pat Kearns

💾 This article is for anyone who has ever questioned the sanity of a date not in ISO 8601 format

Have you ever been assigned to add new fields or concepts to an existing set of models and wondered:

  • Why are there multiple models named almost the same but slightly different?

  • Which model has the fields I need?

  • Which model is upstream or downstream from which?

· 4 min read
David Krevitt

I’ve used the dateadd SQL function thousands of times.

I’ve googled the syntax of the dateadd SQL function all of those times except one, when I decided to hit the "are you feeling lucky" button and go for it.

In switching between SQL dialects (BigQuery, Postgres and Snowflake are my primaries), I can literally never remember the argument order (or exact function name) of dateadd.

This article will go over how the DATEADD function works, the nuances of using it across the major cloud warehouses, and how to standardize the syntax variances using dbt macro.

· 3 min read
Lauren Craigie

Hi there,

Before I get to the goods, I just wanted to quickly flag that Coalesce is less than 3 weeks away! 😱 If you had to choose just ONE of the 60+ sessions on tap, consider Tristan's keynote with A16z's Martin Casado.

It has two of my favorite elements:

1) Spice 🌶️

2) Not-actually-about-us 😅

Martin and Tristan will discuss something we've all probably considered with the latest wave of innovation (and funding) in our space:

Is the modern data stack just another wave in a long string of trendy technologies, or is it somehow more permanent?

Hear their take, and share your own by registering here.

· 4 min read
Lauren Craigie

Hello there,

Do you remember? The 21st day of September? 🎶 Course you do it was two days ago. Well that's a win in your bucket and the day's barely begun! So let's get a win for someone else -- like Jeremy Cohen, the dbt Core product manager.

I'm sure you know that half of the updates in this email are pushed automatically when we upgrade everyone to the latest version of dbt Cloud 🚀

But did you know the other half requires you (or your account admin) to actively switch to the latest version of dbt Core? 😱 If this isn't happening regularly (how-to video here), you may miss out on major improvements to performance, stability, and speed.

Give Jeremy a win and check out the blog he just posted on why this matters even more leading up to 💥dbt v1.0💥. While we're throwing W's, don't forget to also register for his talk at Coalesce now!

· 8 min read
Amy Chen

At dbt Labs, as more folks adopt dbt, we have started to see more and more use cases that push the boundaries of our established best practices. This is especially true to those adopting dbt in the enterprise space.

After two years of helping companies from 20-10,000+ employees implement dbt & dbt Cloud, the below is my best attempt to answer the question: “Should I have one repository for my dbt project or many?” Alternative title: “To mono-repo or not to mono-repo, that is the question!”

· 9 min read
Amy Chen
More up-to-date information available

Since this blog post was first published, many data platforms have added support for materialized views, which are a superior way to achieve the goals outlined here. We recommend them over the below approach.

Before I dive into how to create this, I have to say this. You probably don’t need this. I, along with my other Fishtown colleagues, have spent countless hours working with clients that ask for near-real-time streaming data. However, when we start digging into the project, it is often realized that the use case is not there. There are a variety of reasons why near real-time streaming is not a good fit. Two key ones are:

  1. The source data isn’t updating frequently enough.
  2. End users aren’t looking at the data often enough.

So when presented with a near-real-time modeling request, I (and you as well!) have to be cynical.

· 10 min read
Amy Chen
Dave Connors

If you’ve been using dbt for over a year, your project is out-of-date. This is natural.

New functionalities have been released. Warehouses change. Best practices are updated. Over the last year, I and others on the Fishtown Analytics (now dbt Labs!) team have conducted seven audits for clients who have been using dbt for a minimum of 2 months.