Using dbt Core with Mozart Data

Overview

Mozart Data users have the option to create data models using dbt Core in addition to Mozart Data's SQL-based Transformation feature. With Mozart Data as the orchestrator, users can easily run and orchestrate their dbt jobs by specifying the execution steps.

Curious to learn more about this integration?

Check out our blog post 👉 7 Key Benefits of Integrating dbt into Mozart

Prerequisites

To begin, you must have a dbt repository on either GitHub or GitLab. You can either start with an existing dbt code project in a repository, or create a new repository and connect it to Mozart.

Connecting dbt Project Repo to Mozart Data

To connect a dbt project repo to Mozart Data:

1. Navigate to the Integrations page and select dbt.

2. Click on 'Get Started' to begin connecting your dbt project repo.

3. Follow the provided instructions to add Mozart's public key to your Git repository and enter the repository URL. This will securely connect your dbt project to Mozart Data.

  • For detailed instructions on connecting with GitHub, refer to this documentation: GitHub
  • For detailed instructions on connecting with GitLab, refer to this documentation: GitLab

4. Click on 'Test and Save' to complete the setup.

Once the setup is successful, you will see the 'dbt' option added to your navigation bar. You are all set to start creating and running dbt jobs on Mozart!

Creating and Running dbt Jobs

Once the dbt Git repo is successfully connected to Mozart, the next step is to create and schedule dbt jobs for execution. This step ensures that the dbt models are tested and created regularly.

1. Specify the repository where you want to run the dbt job. By default, Mozart will use the dbt project repo that you connected to run the dbt commands and execute the job. If you want to create a job for the dbt files in a different repo, go to the configuration page and click "Link a new repo". Then, select the desired repo from the dropdown menu.

Screen Shot 2023-08-28 at 4.37.38 PM

2. Provide a name for your dbt job. Please note that this name refers to the job itself, which includes the defined dbt commands for execution, and not the name of the transformation model.

Screen Shot 2023-08-28 at 4.36.15 PM

3. Specify the target schema. This is where the outputs of this dbt job (i.e. the transformation models), will be generated once the dbt job is completed.

4. Execution Steps for dbt Commands. Enter the dbt commands that you want to execute and arrange them in the desired order. You can easily add multiple commands by clicking on the "Add Command" button.

5. Schedule. We highly recommend setting up a schedule for your dbt jobs to ensure regular execution. There are two scheduling options:

  • After all selected connector syncs: This allows you to select which connector(s) need to complete syncing before the dbt job runs. This ensures your data is up-to-date.
  • At a specific time: This allows you to schedule the execution of the dbt job at your preferred time intervals.

Untitled design (2)

Note: Currently, transforms cannot be used as ancestors in the scheduling options. However, we are continuously working on enhancements to make this feature available in the future.

6. Save.

Running dbt Jobs Manually

If you need to run it on a customized schedule to ensure you have the most current data in your transformation, you can easily run it manually in a few simple steps:

1. Click into the dbt job that you want to run manually from the list of dbt jobs.

2. Go to the Runs tab.

3. Click Run Manually.

dbt Job Run History & Logs

If your dbt job encounters an error and requires debugging, or if you simply want to review the logs of recently executed jobs, you can easily access them navigating to the Run History and clicking on "View Logs" to view and analyze the logs. You have the option to copy the log into your clipboard or download it as a CSV file for future reference.

Untitled design (1)

dbt Docs

You can also access the dbt Docs page by clicking on the 'Docs' tab. This is where you can navigate through your dbt models and get more detailed information on your dbt projects, such as the owners, materialization types, any packages, and more.

 


About dbt

dbt™ is the emerging industry standard platform for analytics engineering in the modern data stack. It is used by thousands of companies including JetBlue, HubSpot, and Cisco, as well as data teams at startup and growth stage companies. The dbt Community is one of the most active developer Slack communities and hosts Meetups on five continents. dbt, dbt Core, and dbt Cloud are trademarks of dbt Labs, Inc.