Skip to content

Core Concepts

If you’re going to spend more than ten minutes with Sluice, it’s worth picking up a handful of vocabulary first. Every page in the Pipeline Reference assumes you know these terms.

The unit of work. One YAML file = one pipeline = one migrated entity (customers, products, vendors, purchase orders…). A pipeline declares everything Sluice needs to know to run that migration end-to-end.

Pipelines have a lowercase, hyphenated name: (used for output filenames and log records) and live alongside each other in a folder — usually one folder per client engagement.

Where data comes from. Sluice ships with five built-in source adapters: mssql, pg, csv, xlsx, and rest. Each one knows how to stream raw records from its system into the staging store.

A pipeline has either a single source: block or a sources: array (multi-source mode). See Source Adapters.

Between every phase, data lives in an embedded DuckDB database — a single file (or :memory: if you’d rather not touch disk). DuckDB is fast, columnar, and ships with the @duckdb/node-api npm package; there’s nothing else to install.

Three named tables show up:

TableWhenContents
stg_rawAfter ExtractUntouched source records (single-source)
stg_raw_{sourceId}After ExtractPer-source raw records (multi-source)
stg_mergedAfter MergeMerged records (multi-source only)
stg_transformedAfter TransformFinal shape, ready to load

The default staging file is {outputDir}/{pipeline.name}.duckdb. You can open it in DuckDB CLI or DBeaver and query it directly — handy for debugging.

A configurable set of rules that runs against the staged source data before the transform phase. Rules can be notNull, unique, pattern, email, ukPostcode, min/max, maxLength, allowedValues, or any custom rule loaded via the plugin system.

Each check declares a severity:

  • critical — failing rows are removed from the output. With dq.stopOnCritical: true (default), the pipeline halts.
  • warning — failing rows are kept in the output but logged in the rejection CSV.
  • info — recorded in the DQ summary JSON only.

Rejected rows go to {outputDir}/{pipeline.name}-rejected.csv. A summary lands at {outputDir}/{pipeline.name}-dq-summary.json. See Data Quality Rules.

Field-level mapping from raw → output schema. Each entry in transform.fields[] declares a type (string, number, decimal, boolean, date, lookup, concat, constant, expression, custom), an optional cleanse chain (trim|titleCase|normaliseUnicode), and a destination field name.

Lookup tables — small key/value mappings used for currency-code translation, account-manager-id mapping, etc. — are loaded once at the start of the transform phase and cached in memory. They can come from any source adapter (CSV, MSSQL, REST…). See Transforms.

Where data goes. The open-source core ships with two targets: csv and pg. ERP-specific targets (bc for Business Central, ifs for IFS, bluecherry for BlueCherry) are paid add-ons. See Target Adapters.

After every run, Sluice writes {outputDir}/{pipeline.name}-state.json:

{
"pipeline": "customers-quickstart",
"lastRunAt": "2026-04-15T09:30:00.000Z",
"lastMode": "full",
"rowsExtracted": 1842,
"rowsLoaded": 1801,
"criticalViolations": 0,
"warnings": 41,
"incrementalSince": ""
}

When you run with mode: incremental, Sluice reads lastRunAt from this file and only extracts records newer than that timestamp. The state file is the only built-in incremental marker — there’s no central daemon, no scheduled job, no shared metadata store.

Every single-source run walks through six phases in order. Each phase has its own engine module so the boundaries stay clean.

flowchart LR
A[Config Load<br/>+ Zod Validation] --> B[Source Adapter<br/>Extract → stg_raw]
B --> C[DQ Engine<br/>Validate stg_raw]
C --> D[Transform Engine<br/>stg_raw → stg_transformed]
D --> E[Target Adapter<br/>Load Output]
E --> F[Write Run<br/>State File]

Multi-source pipelines insert a Merge phase between Extract and DQ — see How It Works for the full picture.

Sluice has a three-tier extension model:

  • Tier 1 — Composite rules (YAML) — group built-in DQ checks into a named rule and reference it as { type: composite, name: validUkBusinessAddress }. Zero code.
  • Tier 2 — File plugins (TypeScript) — drop a *.rule.ts, *.transform.ts, or *.merge.ts file into a plugins/ directory next to your pipelines. Sluice auto-discovers them at runtime.
  • Tier 3 — npm packages — publish a reusable plugin set as @scope/sluice-adapter-*, @scope/etl-rules-*, etc., and reference it from sluice.config.yaml.

See Plugin System for the author guide.

Five commands cover everything Sluice does:

CommandWhat
sluice run <pipeline.yaml>Full pipeline run (extract → DQ → transform → load)
sluice validate <pipeline.yaml>DQ + transform only; no load
sluice profile <pipeline.yaml>Extract + column profiling; no DQ
sluice check <pipeline.yaml>Config validation only; no execution
sluice pluginsList all loaded rule, transform, and merge plugins
sluice merge list-strategiesList all registered merge strategies

Exit codes: 0 success · 1 pipeline error · 2 DQ critical violations · 3 config error · 4 enrich error.

You’re ready to write a real pipeline. Start with the Quickstart for a hands-on walkthrough, or jump straight into Writing a Pipeline YAML for an opinionated authoring tour.