Testing

Note for developing on macOS: Docker typically runs inside a Linux VM and the socket may not be at the standard /var/run/docker.sock path. The test tooling automatically searches common non-standard socket locations when the standard path is absent. If auto-detection doesn't find your socket, set the DOCKER_HOST environment variable.

The Saluki testing strategy consists of four main pillars:

Unit Tests (Rust/cargo)
Correctness Tests (panoramic)
Integration Tests (panoramic)
Performance Tests (SMP)

Unit tests

These are found throughout the Rust codebase as you would expect. You can run them with cargo test or you can use make test which will run them with cargo nextest for more parallelization. Platform-specific unit tests should be skipped or compiled-out for platforms they're incompatible with. For conventions on test naming, table-driven tests, shared fixtures, and assertion density, see Testing patterns.

CI: .gitlab/test.yml—runs on both Linux (amd64/arm64) and macOS (amd64/arm64).

Correctness tests (panoramic)

These tests serve to answer the question: Does ADP produce the same output as the Datadog Agent for a given workload?

To answer this question, a correctness test runs ADP and the Datadog Agent side-by-side in containers and compares their output for a given input. The output comparison is semantic, not a simple byte-by-byte comparison, thus heuristics are used to assert correctness.

Terminology Note: Correctness tests are integration tests in the sense that they run the entire system. However, in our repository, integration tests refer to a specific set of smoke tests below.

Correctness test cases are specified by YAML configuration files found in test/correctness/cases.

Binaries and program flow

All binaries live under bin/correctness/:

Binary	Purpose
panoramic	Unified test runner for both correctness and integration tests.
millstone	Deterministic load generator.
datadog-intake	Mock Datadog API: receives test output
airlock	Library for running containers in isolated groups.

Test case configs live in test/correctness/cases/ (for example, test/correctness/cases/dsd-plain/config.yaml).

Program flow

panoramic is the entry-point for running a test. When given a correctness test config, it orchestrates the containers using the airlock library (which talks to containerd via gRPC) and asserts the correctness of the output. It:

reads the test configuration files
starts two sets of containers
- millstone -> ADP -> datadog-intake
- millstone -> Datadog Agent -> datadog-intake
takes the output from datadog-intake and asserts that ADP and Datadog Agent behavior were equivalent

Running

bash

# Run all correctness tests:
make test-correctness

# Run a single test case:
make test-correctness-case CASE=dsd-plain

The test-correctness target handles building all of the necessary prerequisites -- millstone, datadog-intake, panoramic`, the bundled Agent/ADP image, and so on -- before running the test cases.

Correctness tests are run in the e2e stage in CI (.gitlab/e2e.yml).

Integration tests (panoramic)

Integration tests run a containerized ADP instance and assert high-level invariants: process stability, expected log output, port availability, exit behavior. They catch regressions from enabling new features or settings that cause crashes or early exits. They don't test output correctness. This type of test is often known as a "smoke test." For integration tests that check system output, see correctness tests above.

Running

bash

make test-integration        # build images + run all
make test-integration-quick  # skip image builds
make list-integration-tests  # list available tests

Test case config

Test cases live in test/integration/cases/ as config.yaml files. The runner is panoramic (bin/correctness/panoramic/).

Each test defines a container and a list of high-level assertions. Assertions run sequentially by default but can also be configured to run in parallel.

CI: .gitlab/e2e.yml—same file as correctness, e2e stage, 10 min timeout, retry 2.

Benchmark tests: Single Machine Performance (SMP)

SMP is a system that runs on internal, dedicated infrastructure to check the Agent for performance regressions. It runs experiments across multiple replicates with statistical analysis and posts reports to PRs. Maps to the benchmark stage in GitLab CI (.gitlab/benchmark.yml).

Each experiment pairs a target (ADP container) with Lading (deterministic load generator). Lading sends payloads (DogStatsD, OTLP, logs, etc.) at configured rates using seeded generators for reproducibility. SMP measures CPU, memory, throughput—not output correctness.

Experiments

Defined in test/smp/regression/adp/experiments.yaml. Run make generate-smp-experiments to generate per-case configs into two suites: test/smp/regression/adp/quality-gates/ (experiments with checks: bounds) and test/smp/regression/adp/full/ (all experiments). Each case gets an experiment.yaml (target config) and lading/lading.yaml (load config). See test/smp/README.md for the suite model.

PRs gate only on the quality-gates suite, comparing the branch against the merge-base of main—"has your change regressed or improved against the bounds?" The full suite runs nightly on main (date-anchored against the previous nightly, reported to Slack) and on-demand as a manual job on a PR; it is for long-term trend analysis and does not gate.

You can run experiments locally with smp local-run to debug experiment configs without waiting for CI (single replicate, no statistical analysis). This is mainly useful when iterating on a new or broken experiment—for normal development, lean on CI.

Fuzzing

A fifth type of testing is fuzzing. We aren't doing a lot with fuzzing right now, but what we've uses cargo-fuzz and operates at the function-level. More fuzzing coverage will likely come in the future.

Loom and Miri testing

A sixth type of testing targets concurrency and memory-model correctness rather than business logic. Loom exhaustively explores how concurrent operations can interleave, to catch ordering bugs that are hard to hit through normal test runs. Run it with make test-loom. Miri runs tests under an interpreter that catches undefined behavior—data races, use-after-free, invalid memory access—that a normal test run wouldn't detect. Run it with make test-miri.

Directory index

.gitlab/
├── test.yml               unit tests
├── e2e.yml                correctness + integration
├── benchmark.yml          SMP benchmarks
└── fuzz.yml               fuzz testing

# custom test tooling
bin/correctness/
├── panoramic/             : unified test runner (correctness + integration)
├── millstone/             : load generator
├── datadog-intake/        : mock Datadog API
├── airlock/               : container isolation lib
└── stele/                 : telemetry data types

# actual test cases
test/
├── correctness/           : correctness test configs
├── integration/           : integration test configs
└── smp/                   : smp experiments

Testing ​

Unit tests ​

Correctness tests (panoramic) ​

Binaries and program flow ​

Program flow ​

Running ​

Integration tests (panoramic) ​

Running ​

Test case config ​

Benchmark tests: Single Machine Performance (SMP) ​

Experiments ​

Fuzzing ​

Loom and Miri testing ​

Directory index ​

Testing

Unit tests

Correctness tests (panoramic)

Binaries and program flow

Program flow

Running

Integration tests (panoramic)

Running

Test case config

Benchmark tests: Single Machine Performance (SMP)

Experiments

Fuzzing

Loom and Miri testing

Directory index