Introduction

The Pointblank library is all about assessing the state of data quality for a table. You provide the validation rules and the library will dutifully interrogate the data and provide useful reporting. We can use different types of tables like Polars and Pandas DataFrames, Parquet files, or various database tables. Let’s walk through what data validation looks like in Pointblank.

A Simple Validation Table

This is a validation report table that is produced from a validation of a Polars DataFrame:

Show the code
import pointblank as pb

(
    pb.Validate(data=pb.load_dataset(dataset="small_table"), label="Example Validation")
    .col_vals_lt(columns="a", value=10)
    .col_vals_between(columns="d", left=0, right=5000)
    .col_vals_in_set(columns="f", set=["low", "mid", "high"])
    .col_vals_regex(columns="b", pattern=r"^[0-9]-[a-z]{3}-[0-9]{3}$")
    .interrogate()
)
Pointblank Validation
Example Validation
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_lt
col_vals_lt()
a 10 13 13
1.00
0
0.00
#4CA64C66 2
col_vals_between
col_vals_between()
d [0, 5000] 13 12
0.92
1
0.08
#4CA64C 3
col_vals_in_set
col_vals_in_set()
f low, mid, high 13 13
1.00
0
0.00
#4CA64C 4
col_vals_regex
col_vals_regex()
b ^[0-9]-[a-z]{3}-[0-9]{3}$ 13 13
1.00
0
0.00

Each row in this reporting table constitutes a single validation step. Roughly, the left-hand side outlines the validation rules and the right-hand side provides the results of each validation step. While simple in principle, there’s a lot of useful information packed into this validation table.

Here’s a diagram that describes a few of the important parts of the validation table:

There are three things that should be noted here:

  • validation steps: each step is a separate test on the table, focused on a certain aspect of the table
  • validation rules: the validation type is provided here along with key constraints
  • validation results: interrogation results are provided here, with a breakdown of test units (total, passing, and failing), threshold flags, and more

The intent is to provide the key information in one place, and have it be interpretable by data stakeholders. For example, a failure can be seen in the second row (notice there’s a CSV button). A data quality stakeholder could click this to download a CSV of the failing rows for that step.

Example Code, Step-by-Step

This section will walk you through the example code used above.

import pointblank as pb

(
    pb.Validate(data=pb.load_dataset(dataset="small_table"))
    .col_vals_lt(columns="a", value=10)
    .col_vals_between(columns="d", left=0, right=5000)
    .col_vals_in_set(columns="f", set=["low", "mid", "high"])
    .col_vals_regex(columns="b", pattern=r"^[0-9]-[a-z]{3}-[0-9]{3}$")
    .interrogate()
)

Note these three key pieces in the code:

  • data: the Validate(data=) argument takes a DataFrame or database table that you want to validate
  • steps: the methods starting with col_vals_ specify validation steps that run on specific columns
  • execution: the interrogate() method executes the validation plan on the table

This common pattern is used in a validation workflow, where Validate and interrogate() bookend a validation plan generated through calling validation methods.

In the next few sections we’ll go a bit further by understanding how we can measure data quality and respond to failures.

Understanding Test Units

Each validation step will execute a type of validation test on the target table. For example, a col_vals_lt() validation step can test that each value in a column is less than a specified number. And the key finding that’s reported in each step is the number of test units that pass or fail.

In the validation report table, test unit metrics are displayed under the UNITS, PASS, and FAIL columns. This diagram explains what the tabulated values signify:

Test units are dependent on the test being run. Some validation methods might test every value in a particular column, so each value will be a test unit. Others will only have a single test unit since they aren’t testing individual values but rather if the overall test passes or fails.

Setting Thresholds for Data Quality Signals

Understanding test units is essential because they form the foundation of Pointblank’s threshold system. Thresholds let you define acceptable levels of data quality, triggering different severity signals (‘warning’, ‘error’, or ‘critical’) when certain failure conditions are met.

Here’s a simple example that uses a single validation step along with thresholds set using the Thresholds class:

(
    pb.Validate(data=pb.load_dataset(dataset="small_table"))
    .col_vals_lt(
        columns="a",
        value=7,

        # Set the 'warning' and 'error' thresholds ---
        thresholds=pb.Thresholds(warning=2, error=4)
    )
    .interrogate()
)
Pointblank Validation
2025-05-21|17:29:36
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#AAAAAA 1
col_vals_lt
col_vals_lt()
a 7 13 11
0.85
2
0.15

If you look at the validation report table, we can see:

  • the FAIL column shows that 2 tests units have failed
  • the W column (short for ‘warning’) shows a filled gray circle indicating those failing test units reached that threshold value
  • the E column (short for ‘error’) shows an open yellow circle indicating that the number of failing test units is below that threshold

The one final threshold level, C (for ‘critical’), wasn’t set so it appears on the validation table as a long dash.

Taking Action on Threshold Exceedances

Pointblank becomes even more powerful when you combine thresholds with actions. The Actions class lets you trigger responses when validation failures exceed threshold levels, turning passive reporting into active notifications.

Here’s a simple example that adds an action to the previous validation:

(
    pb.Validate(data=pb.load_dataset(dataset="small_table"))
    .col_vals_lt(
        columns="a",
        value=7,
        thresholds=pb.Thresholds(warning=2, error=4),

        # Set an action for the 'warning' threshold ---
        actions=pb.Actions(
            warning="WARNING: Column 'a' has values that aren't less than 7."
        )
    )
    .interrogate()
)
WARNING: Column 'a' has values that aren't less than 7.
Pointblank Validation
2025-05-21|17:29:36
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#AAAAAA 1
col_vals_lt
col_vals_lt()
a 7 13 11
0.85
2
0.15

Notice the printed warning message: "WARNING: Column 'a' has values that aren't less than 7.". The warning indicator (filled gray circle) visually confirms this threshold was reached and the action should trigger.

Actions make your validation workflows more responsive and integrated with your data pipelines. For example, you can generate console messages, Slack notifications, and more.