Pipeline YAML Schema

This page is the definitive, searchable reference for every key in a Sluice pipeline YAML. It is auto-generated from the Zod schema in src/config/schema.ts so it cannot drift from the code.

Top-level structure

pipeline:   { ... }   # identity and metadata (always required)
source:     { ... }   # single-source mode (mutually exclusive with sources+merge)
sources:    [ ... ]   # multi-source mode — minimum two entries
merge:      { ... }   # required when sources is present
enrich:     { ... }   # OPTIONAL — Phase 4a; private add-on
dq:         { ... }   # data quality rules
transform:  { ... }   # field mappings, lookups, transforms
target:     { ... }   # load destination
run:        { ... }   # execution options (defaults provided)

`pipeline` — identity and metadata

Key	Type	Required	Default	Description
`name`	string (lowercase, hyphenated)	yes	—	Pipeline slug used in output filenames and log records. Must match `^[a-z0-9-]+$`.
`client`	string	yes	—	Client identifier; appears in load reports and run state.
`version`	string	yes	—	Pipeline version (quote in YAML to keep it a string).
`entity`	string	yes	—	Logical entity name (used in load reports and target adapter metadata).
`description`	string	no	—	Human-readable description.

`source` — single-source pipelines

Where the data comes from. Exactly one of query, file, or endpoint must be set. See Source Adapters for adapter-specific config.

Key	Type	Required	Default	Description
`adapter`	`mssql` \| `pg` \| `csv` \| `xlsx` \| `rest` \| `odoo-csv`	yes	—	Source adapter id. One of `mssql`, `pg`, `csv`, `xlsx`, `rest`, `odoo-csv`.
`connection`	string	no	—	Connection string for SQL adapters. Resolves `${ENV_VAR}` tokens at load time. Required for `mssql` and `pg`.
`query`	string	no	—	SELECT statement for SQL adapters. Required for `mssql` and `pg`.
`file`	string	no	—	Path or glob for file adapters. Required for `csv`, `xlsx`, and `odoo-csv`.
`endpoint`	string	no	—	Full URL for the `rest` adapter. Required for `rest`.
`headers`	`Record<string, string>`	no	—	Optional HTTP headers, applied to every request from the `rest` adapter.
`delimiter`	string	no	`","`	Field delimiter for the `csv` and `odoo-csv` adapters.
`encoding`	string	no	`"utf-8"`	File encoding for `csv`, `xlsx`, and `odoo-csv` (any Node-supported encoding).
`sheet`	string \| number	no	—	Sheet name or 0-based index for the `xlsx` adapter.
`pagination`	object	no	—	Pagination config for the `rest` adapter. Omit for single-page responses.
`pivot`	object	no	—	Used by the `odoo-csv` adapter only. Declares one column whose `Key: value` cells should be pivoted into new columns named after the keys.

`sources[]` — multi-source pipelines

In multi-source mode, replace source: with a sources: array (minimum two entries). Each entry inherits the single-source source keys above plus the three multi-source-only keys below: id, priority, and rename. Mutually exclusive with source.

Key	Type	Required	Default	Description
`adapter`	`mssql` \| `pg` \| `csv` \| `xlsx` \| `rest` \| `odoo-csv`	yes	—	Source adapter id. One of `mssql`, `pg`, `csv`, `xlsx`, `rest`, `odoo-csv`.
`connection`	string	no	—	Connection string for SQL adapters. Resolves `${ENV_VAR}` tokens at load time. Required for `mssql` and `pg`.
`query`	string	no	—	SELECT statement for SQL adapters. Required for `mssql` and `pg`.
`file`	string	no	—	Path or glob for file adapters. Required for `csv`, `xlsx`, and `odoo-csv`.
`endpoint`	string	no	—	Full URL for the `rest` adapter. Required for `rest`.
`headers`	`Record<string, string>`	no	—	Optional HTTP headers, applied to every request from the `rest` adapter.
`delimiter`	string	no	`","`	Field delimiter for the `csv` and `odoo-csv` adapters.
`encoding`	string	no	`"utf-8"`	File encoding for `csv`, `xlsx`, and `odoo-csv` (any Node-supported encoding).
`sheet`	string \| number	no	—	Sheet name or 0-based index for the `xlsx` adapter.
`pagination`	object	no	—	Pagination config for the `rest` adapter. Omit for single-page responses.
`pivot`	object	no	—	Used by the `odoo-csv` adapter only. Declares one column whose `Key: value` cells should be pivoted into new columns named after the keys.
`id`	string	yes	—	Source id — lowercase alphanumeric with hyphens only. Must be unique across the array; used as the staging table suffix (`stg_raw_{id}`).
`priority`	number	yes	—	Positive integer priority. Lower priority = higher precedence in `coalesce` and `priority-override` strategies.
`rename`	`Record<string, string>`	no	—	Optional rename map `{ "old column": "new column" }` applied in-place after extract. Useful for harmonising CSV/XLSX column headers; SQL/REST sources should rename in the query instead.

`merge` — multi-source merge config

Required whenever sources is set. See How It Works for the merge flow diagram.

Key	Type	Required	Default	Description
`key`	string \| string[]	yes	—	Merge key — single column name or array of columns (composite key). Must exist in every source after `rename` is applied.
`strategy`	`coalesce` \| `priority-override` \| `union` \| `intersect`	no	`"coalesce"`	Merge strategy. `coalesce` = first non-null wins (priority-ordered); `priority-override` = highest-priority source always wins; `union` = all rows deduped by key; `intersect` = rows present in every source.
`onUnmatched`	`include` \| `exclude` \| `warn` \| `error`	no	`"include"`	What to do with rows present in fewer than all sources. Ignored by `intersect`.
`fieldStrategies`	object[]	no	`[]`	Per-field overrides of the top-level strategy. Use to pin a specific field to a specific source, or to override the strategy on a single field.
`conflictLog`	string	no	—	Optional CSV path to log per-conflict detail (key, field, winning source, source values). Only written when at least one conflict is detected.
`incrementalSource`	string	no	—	Source id used for incremental filtering. Required when `run.mode: incremental`. Other sources run full each time.

`dq` — data quality

See Data Quality Rules for the full check-type reference.

Key	Type	Required	Default	Description
`rulesFile`	string	no	—	Optional path to a composite rule library YAML (Tier 1 plugins). Composite-rule references in `rules[]` are expanded into built-in checks before Zod validation.
`stopOnCritical`	boolean	no	`true`	When true, the pipeline halts (exit code 2) if any `critical` check fails. When false, critical violations are logged but the run continues.
`rejectionFile`	string	no	—	Path for the per-violation rejection CSV. Defaults to `{outputDir}/{pipeline.name}-rejected.csv`.
`rules`	object[]	no	`[]`	List of DQ rules. Each entry binds a field to one or more checks.

`enrich` — Phase 4a (paid add-on)

Optional. Runs between Extract (or Merge) and DQ. The framework that consumes this block lives in the private @caracal-lynx/sluice-enrich package; with that not installed, an enrich: block is parsed and validated but the phase is skipped with a WARN log.

Key	Type	Required	Default	Description
`cache`	boolean \| `"persist"`	no	`true`	Cache strategy for enrich providers. `true` = in-memory cache; `false` = no cache; `"persist"` = on-disk cache between runs.
`onError`	`flag` \| `skip` \| `fail`	no	`"flag"`	Pipeline-wide error policy when an enrich call fails. `flag` = mark row but keep it; `skip` = drop the row; `fail` = halt the pipeline.
`lookups`	object[]	yes	—	At least one enrich lookup. Each binds a source field to a provider (e.g. VIES, HMRC) and writes the result into named columns.

`transform`

Field mappings, lookups, transforms, and cleanse ops. See Transforms for every available type.

Key	Type	Required	Default	Description
`lookups`	object[]	no	`[]`	Named lookup tables, loaded once at the start of the transform phase and cached in memory. Any source adapter (CSV, MSSQL, REST, …) is valid here.
`fields`	object[]	yes	—	Field mappings — one entry per output column. Order is preserved.
`unmappedPlaceholder`	string	no	`"* TBC *"`	Value emitted by field mappings with `unmapped: true`. Used during iterative mapping so draft pipelines run end-to-end before source fields are identified.

`target`

Where the data goes. See Target Adapters for adapter-specific config.

Key	Type	Required	Default	Description
`adapter`	`bc` \| `ifs` \| `bluecherry` \| `csv` \| `pg` \| `rest`	yes	—	Target adapter id. Built-in: `csv`, `pg`. Paid add-ons: `bc`, `ifs`, `bluecherry`, `rest`.
`output`	string	no	—	Output file path for file-based targets (`csv`, `ifs`, `bluecherry`).
`entity`	string	no	—	Logical entity name. ERP adapters use this to resolve required-column lists; OData adapters use it as the entity-set name.
`connection`	string	no	—	Connection string for the `pg` adapter. Resolves `${ENV_VAR}` at load time.
`includeHeader`	boolean	no	—	Whether to write a header row. Defaults: `csv` true, `ifs` false, `bluecherry` true.
`columnOrder`	string[]	no	—	Explicit column ordering for file-based targets. Required by ERP imports that load by position rather than by name.
`dateFormat`	string	no	—	dayjs format token for date columns. Defaults: `YYYY-MM-DD` for `csv`/`ifs`, `MM/DD/YYYY` for `bluecherry`.
`delimiter`	string	no	`","`	Field delimiter for file-based targets.
`encoding`	string	no	`"utf-8"`	Output file encoding.
`nullValue`	string	no	`""`	Rendered value for null/undefined cells in CSV output.
`template`	string	no	—	Path to a header-only CSV template, used to override the default required-column ordering for `bluecherry`. The literal `default` selects the built-in column set.
`baseUrl`	string	no	—	Base URL for the `bc` adapter. Resolves `${ENV_VAR}`.
`company`	string	no	—	Company GUID or name for the `bc` adapter.
`apiVersion`	string	no	`"v2.0"`	OData API version for the `bc` adapter.
`onConflict`	`fail` \| `upsert` \| `ignore`	no	`"fail"`	Behaviour when the target rejects an insert as a conflict. `upsert` issues a PATCH (`bc`) or INSERT … ON CONFLICT UPDATE (`pg`).
`upsertKey`	string[]	no	—	Conflict key for `pg` upserts. Required when `onConflict: upsert`.
`batchEndpoint`	boolean	no	`true`	Use OData `$batch` for the `bc` adapter (max 100 ops per batch). Disable for adapters that don’t support batching.
`table`	string	no	—	Target table name for the `pg` adapter.
`schema`	string	no	`"public"`	Target schema for the `pg` adapter.

`run` — execution options

Every field is optional with a sensible default; you can omit the entire run: block.

Key	Type	Required	Default	Description
`mode`	`full` \| `incremental` \| `validate-only`	no	`"full"`	Run mode. `full` extracts and loads everything; `incremental` filters source records by `incrementalField`; `validate-only` runs DQ + transform but skips the load.
`batchSize`	number	no	`500`	DuckDB insert batch size during extract and transform.
`onError`	`continue` \| `stop`	no	`"continue"`	Behaviour when a target adapter rejects a row. `continue` increments `rowsFailed`; `stop` aborts the load.
`logLevel`	`debug` \| `info` \| `warn` \| `error`	no	`"info"`	pino log level. `debug` adds per-row progress lines and disables the progress bar.
`dryRun`	boolean	no	`false`	When true, skip the load phase even if a target is configured. Equivalent to passing `--dry-run` on the command line.
`outputDir`	string	no	`"./output"`	Base directory for run artefacts (rejection CSV, DQ summary JSON, run state JSON, default DuckDB staging file).
`stagingDb`	string	no	`""`	DuckDB staging file path. Empty string defaults to `{outputDir}/{pipeline.name}.duckdb`. Set to `:memory:` to force in-memory mode.
`enrichConcurrency`	number	no	`5`	Phase 4a — concurrent in-flight enrich provider requests. Consumed by `@caracal-lynx/sluice-enrich` if installed.
`enrichTimeoutMs`	number	no	`5000`	Phase 4a — per-request timeout for enrich provider calls. Consumed by `@caracal-lynx/sluice-enrich` if installed.
`enrichMaxRetries`	number	no	`3`	Phase 4a — max retries for enrich provider calls. Consumed by `@caracal-lynx/sluice-enrich` if installed.
`incrementalField`	string	no	—	Source column used to filter incremental runs (must be timestamp-coercible).
`incrementalSince`	string	no	—	ISO datetime override for the incremental window start. If empty, the runner reads `lastRunAt` from the state file.

Worked example

A complete single-source pipeline migrating customers from MSSQL to IFS via the paid ifs adapter:

pipeline:
  name: acme-corp-customers
  client: acme-corp
  version: "1.0"
  entity: CustomerInfo
  description: Customer master — legacy SQL to IFS ERP

source:
  adapter: mssql
  connection: ${SOURCE_MSSQL}
  query: |
    SELECT c.CUST_CODE, c.CUST_NAME, c.POST_CODE, c.EMAIL, c.CREDIT_LIMIT
    FROM dbo.Customers c
    WHERE c.Active = 1 AND c.DELETED = 0

dq:
  stopOnCritical: true
  rejectionFile: ./output/acme-corp-customers-rejected.csv
  rules:
    - field: CUST_CODE
      checks:
        - { type: notNull, severity: critical }
        - { type: unique,  severity: critical }
    - field: EMAIL
      checks:
        - { type: email, severity: warning }

transform:
  fields:
    - { from: CUST_CODE, to: CustomerNo, type: string, max: 20 }
    - { from: CUST_NAME, to: Name,       type: string, cleanse: trim|titleCase }
    - { from: EMAIL,     to: Email,      type: string, cleanse: trim|lowercase }
    - { to: CustomerGroup, type: constant, value: DOMESTIC }

target:
  adapter: ifs
  entity: CustomerInfo
  output: ./output/acme-corp-customers-ifs.csv
  includeHeader: false
  columnOrder: [CustomerNo, Name, Email, CustomerGroup]

run:
  mode: full
  batchSize: 500
  logLevel: info

Pipeline YAML Schema

Top-level structure

pipeline — identity and metadata

source — single-source pipelines

sources[] — multi-source pipelines

merge — multi-source merge config

dq — data quality

enrich — Phase 4a (paid add-on)

transform

target

run — execution options