WhatsApp Image 2023-11-27 at 3.11.17 PM

Mastering Data Integrity : the Crucial Role of DBT Validation in Your Workflow

Validations: –

Validations in dbt refer to the process of checking the quality, integrity, and correctness of transformed data using SQL queries. These checks help ensure that the data produced by your DBT models adheres to business rules, data quality standards, and other criteria. DBT allows you to automate these checks, report issues, and take action based on the validation results.

Benefits of Tests: –

  • Data Quality         
  • Collaboration and Maintenance
  • Documentation

Types of Tests: –

  • Generic Tests
  • Singular Tests

Singular Test: –

Singular tests are a type of test in dbt that involves writing a SQL query, which if it returns any rows, would represent a failing test. They are one-off assertions usable for a single purpose. They are defined in .SQL files, typically in the test directory. They can include jinja in the SQL query. An example of a singular test is to check if there are any negative or null values in a table. To create a singular test, you can write a SQL query that returns failing rows and save it in a. sql file within your test directory. It will be executed by the dbt test command.

Test Selection Examples: –

dbt test —- Checks all the tests in project

dbt test –select test_type:singular                                                        # checks only singular tests

dbt test –select test_type:generic                                                        # checks only generic tests

dbt test –select test_name (ex: dbt test –select dim_listings)       # to test single test

dbt test –select config.materialized:table                                          # to test specific materializations

dbt test –select config.materialized:seed                                          # to test seeds

dbt test –select config.materialized:snapshot                                  # to test snapshots

Schema.yml
In dbt (data build tool), the schema.yml file is used to define the structure and configuration of your data models. It includes information such as model descriptions, column descriptions and tests (Generic test,  dbt package related tests like dbt_utils, dbt_expectations)

Advantages: –

  • Used to define the structure and configuration of your data models.
  • It includes information such as model descriptions, column descriptions and tests
  • Makes it easier to maintain your models and keep them up to date.

Generic Tests: –

Generic tests are written and stored in YML files, with parameterized queries that can be used across different DBT models. And they can be used over again and again. The main Components of generic tests:

  • Unique
  • Not null
  • Relationship
  • Accepted Values

Unique: –

unique is a test to verify that every value in a column contains unique values.

e.g.

In the above example it checks if the claim_id columns contain unique values or not. If not, test will be failed otherwise pass.

Not Null: –

The Not Null test ensures that a column in a model doesn’t contain any null values.

e.g:

In the above example it checks if the versions column to see if it contains any null values. If no nulls populated test will be Passed otherwise Failed.

Accepted Values: –

This test is used to validate whether a set of values within a column is present.

In the above example the payee number should accept either 0 or 1. Other than these values test will be Failed.

Relationship: –

A relationship typically involves ensuring that a foreign key relationship between two models is maintained.

e.g:

Kestrel_synergy_report model’s claimant_state column tests a relationship with postal_state in

Seed_loss_location_status for data consistency. If not, it will be Failed.

Custom Generic Tests: –

In dbt, it is also possible to define your own custom generic tests. This may be useful when you find yourself creating similar Singular tests. A custom generic test is essentially the same as a dbt macro which has at least a model as a parameter, and optionally column name. if the test will apply to a column. Once the generic test is defined, it can be applied many times just like the generic tests shipped with dbt Core. It is also possible to pass additional parameters to a custom generic test.

 We create these tests within macros folder.

To run this above code, we have to define it in schema.yml and use dbt test command in terminal.

Advanced Tests in DBT: –

DBT comes built with a handful of built-in generic tests and even more tests are available from community DBT packages. This dbt package contains macros that can be (re)used across dbt projects.

We can directly use this test cases in schema.yml for any required models.

  •   dbt utils
  •   dbt expectations

dbt utils: –

The dbt-utils package is a collection of macros that enhances the dbt experience by offering a suite of utility macros. It is designed to tackle common SQL modeling patterns, streaming complex operations, allowing users to focus on data transformation rather than the intricacies of SQL. Dbt does provide some utility functions and macros that can be used within your dbt projects.

The dbt_utils package include 16 generic tests including:

  • not_accepted_values
  • equal_rowcount
  • fewer_rows_than

You can find detailed information on all the dbt-utils generics tests using given reference link

Reference link: GitHub – dbt-labs/dbt-utils: Utility functions for dbt projects.

You can install the package by including the following in your packages. yml file.

You can then run dbt deps in gitbash to install the package. dbt-expectations.

Below are the sample tests from dbt_utils:

Equal row count: –

Check that two relations (Models) have the same number of rows.

Replace model, compare model with existing models in your dbt.

at_least_one: –

Asserts that a column has at least one value.

Replace model_name with your Model (Table Name), col_name with your Column Name.

dbt expectations: –

dbt-expectations are an extension package for dbt that allows users to deploy data quality tests in their data warehouse directly from dbt. It is inspired by the Great Expectations package for Python. Data quality is an important aspect of data governance, and dbt-expectations help to flag anomalies or quality issues in data.

Tests in dbt-expectations are divided into seven categories encompassing a total of 62 generic dbt tests:

  • Table shape (15 generic dbt tests)
  • Missing values, unique values, and types (6 generic dbt tests)
  • Sets and ranges (5 generic dbt tests)
  • String matching (10 generic dbt tests)
  • Aggregate functions (17 generic dbt tests)
  • Multi-column (6 generic dbt tests)
  • Distributional functions (3 generic dbt tests)

You can find detailed information on all the dbt-expectations generics tests using given reference link

Reference link: GitHub – calogica/dbt-expectations: Port(ish) of Great Expectations to dbt test macros

You can install dbt-expectations by adding the following code to your packages.yml file:

You can then run dbt deps in gitbash to install the package. dbt-expectations.

Below are the sample tests from dbt_expectations

Expect_column_value_lengths_to_equal :-

Expect column entries to be strings with length equal to the provided value.

Replace model_name with your Model (Table Name), col_name with your Column Name.

Expect Column Distinct count to Equal: –

Expect the number of distinct column values to be equal to a given value.

Replace model_name with your Model (Table Name), col_name , col1… with your Column Names.

Expect Column values to be in set: –

Expect each column value to be in a given set.

Replace model_name with your Model (Table Name), col_name  with your Column Name.

Tags: –

In dbt, tags can be applied to tests to help organize and categorize them. Tags provide a way to label or annotate tests based on specific criteria, and you can use these tags for various purposes, such as filtering or grouping tests when running dbt commands.

Commands to run Tags: –

  •   dbt test –select tag: my_tag           (e.g. dbt test –select tag: a)
  •   dbt test –select tag: my_tag  –exclude tag:other_tag      

                 (e.g. – dbt test – select tag: a –exclude tag:b)

Severity: –

In dbt, severity is a configuration option that allows you to set the severity of test results. By default, tests return an error if they fail. However, you can configure tests to return warnings instead of errors, or to make the test status conditional on the number of failures returned.

 Severity: error (default: error)

Severity: warn

For more information you can refer dbt test documentation (https://docs.getdbt.com/docs/build/tests)