Pipeline YAML Schema
This page is the definitive, searchable reference for every key in a Sluice pipeline YAML. It is auto-generated from the Zod schema in src/config/schema.ts so it cannot drift from the code.
Top-level structure
Section titled “Top-level structure”pipeline: { ... } # identity and metadata (always required)source: { ... } # single-source mode (mutually exclusive with sources+merge)sources: [ ... ] # multi-source mode — minimum two entriesmerge: { ... } # required when sources is presentenrich: { ... } # OPTIONAL — Phase 4a; private add-ondq: { ... } # data quality rulestransform: { ... } # field mappings, lookups, transformstarget: { ... } # load destinationrun: { ... } # execution options (defaults provided)pipeline — identity and metadata
Section titled “pipeline — identity and metadata”| Key | Type | Required | Default | Description |
|---|---|---|---|---|
name | string (lowercase, hyphenated) | yes | — | Pipeline slug used in output filenames and log records. Must match ^[a-z0-9-]+$. |
client | string | yes | — | Client identifier; appears in load reports and run state. |
version | string | yes | — | Pipeline version (quote in YAML to keep it a string). |
entity | string | yes | — | Logical entity name (used in load reports and target adapter metadata). |
description | string | no | — | Human-readable description. |
source — single-source pipelines
Section titled “source — single-source pipelines”Where the data comes from. Exactly one of query, file, or endpoint must be set. See Source Adapters for adapter-specific config.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
adapter | mssql | pg | csv | xlsx | rest | odoo-csv | yes | — | Source adapter id. One of mssql, pg, csv, xlsx, rest, odoo-csv. |
connection | string | no | — | Connection string for SQL adapters. Resolves ${ENV_VAR} tokens at load time. Required for mssql and pg. |
query | string | no | — | SELECT statement for SQL adapters. Required for mssql and pg. |
file | string | no | — | Path or glob for file adapters. Required for csv, xlsx, and odoo-csv. |
endpoint | string | no | — | Full URL for the rest adapter. Required for rest. |
headers | Record<string, string> | no | — | Optional HTTP headers, applied to every request from the rest adapter. |
delimiter | string | no | "," | Field delimiter for the csv and odoo-csv adapters. |
encoding | string | no | "utf-8" | File encoding for csv, xlsx, and odoo-csv (any Node-supported encoding). |
sheet | string | number | no | — | Sheet name or 0-based index for the xlsx adapter. |
pagination | object | no | — | Pagination config for the rest adapter. Omit for single-page responses. |
pivot | object | no | — | Used by the odoo-csv adapter only. Declares one column whose Key: value cells should be pivoted into new columns named after the keys. |
sources[] — multi-source pipelines
Section titled “sources[] — multi-source pipelines”In multi-source mode, replace source: with a sources: array (minimum two entries). Each entry inherits the single-source source keys above plus the three multi-source-only keys below: id, priority, and rename. Mutually exclusive with source.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
adapter | mssql | pg | csv | xlsx | rest | odoo-csv | yes | — | Source adapter id. One of mssql, pg, csv, xlsx, rest, odoo-csv. |
connection | string | no | — | Connection string for SQL adapters. Resolves ${ENV_VAR} tokens at load time. Required for mssql and pg. |
query | string | no | — | SELECT statement for SQL adapters. Required for mssql and pg. |
file | string | no | — | Path or glob for file adapters. Required for csv, xlsx, and odoo-csv. |
endpoint | string | no | — | Full URL for the rest adapter. Required for rest. |
headers | Record<string, string> | no | — | Optional HTTP headers, applied to every request from the rest adapter. |
delimiter | string | no | "," | Field delimiter for the csv and odoo-csv adapters. |
encoding | string | no | "utf-8" | File encoding for csv, xlsx, and odoo-csv (any Node-supported encoding). |
sheet | string | number | no | — | Sheet name or 0-based index for the xlsx adapter. |
pagination | object | no | — | Pagination config for the rest adapter. Omit for single-page responses. |
pivot | object | no | — | Used by the odoo-csv adapter only. Declares one column whose Key: value cells should be pivoted into new columns named after the keys. |
id | string | yes | — | Source id — lowercase alphanumeric with hyphens only. Must be unique across the array; used as the staging table suffix (stg_raw_{id}). |
priority | number | yes | — | Positive integer priority. Lower priority = higher precedence in coalesce and priority-override strategies. |
rename | Record<string, string> | no | — | Optional rename map { "old column": "new column" } applied in-place after extract. Useful for harmonising CSV/XLSX column headers; SQL/REST sources should rename in the query instead. |
merge — multi-source merge config
Section titled “merge — multi-source merge config”Required whenever sources is set. See How It Works for the merge flow diagram.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
key | string | string[] | yes | — | Merge key — single column name or array of columns (composite key). Must exist in every source after rename is applied. |
strategy | coalesce | priority-override | union | intersect | no | "coalesce" | Merge strategy. coalesce = first non-null wins (priority-ordered); priority-override = highest-priority source always wins; union = all rows deduped by key; intersect = rows present in every source. |
onUnmatched | include | exclude | warn | error | no | "include" | What to do with rows present in fewer than all sources. Ignored by intersect. |
fieldStrategies | object[] | no | [] | Per-field overrides of the top-level strategy. Use to pin a specific field to a specific source, or to override the strategy on a single field. |
conflictLog | string | no | — | Optional CSV path to log per-conflict detail (key, field, winning source, source values). Only written when at least one conflict is detected. |
incrementalSource | string | no | — | Source id used for incremental filtering. Required when run.mode: incremental. Other sources run full each time. |
dq — data quality
Section titled “dq — data quality”See Data Quality Rules for the full check-type reference.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
rulesFile | string | no | — | Optional path to a composite rule library YAML (Tier 1 plugins). Composite-rule references in rules[] are expanded into built-in checks before Zod validation. |
stopOnCritical | boolean | no | true | When true, the pipeline halts (exit code 2) if any critical check fails. When false, critical violations are logged but the run continues. |
rejectionFile | string | no | — | Path for the per-violation rejection CSV. Defaults to {outputDir}/{pipeline.name}-rejected.csv. |
rules | object[] | no | [] | List of DQ rules. Each entry binds a field to one or more checks. |
enrich — Phase 4a (paid add-on)
Section titled “enrich — Phase 4a (paid add-on)”Optional. Runs between Extract (or Merge) and DQ. The framework that consumes this block lives in the private @caracal-lynx/sluice-enrich package; with that not installed, an enrich: block is parsed and validated but the phase is skipped with a WARN log.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
cache | boolean | "persist" | no | true | Cache strategy for enrich providers. true = in-memory cache; false = no cache; "persist" = on-disk cache between runs. |
onError | flag | skip | fail | no | "flag" | Pipeline-wide error policy when an enrich call fails. flag = mark row but keep it; skip = drop the row; fail = halt the pipeline. |
lookups | object[] | yes | — | At least one enrich lookup. Each binds a source field to a provider (e.g. VIES, HMRC) and writes the result into named columns. |
transform
Section titled “transform”Field mappings, lookups, transforms, and cleanse ops. See Transforms for every available type.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
lookups | object[] | no | [] | Named lookup tables, loaded once at the start of the transform phase and cached in memory. Any source adapter (CSV, MSSQL, REST, …) is valid here. |
fields | object[] | yes | — | Field mappings — one entry per output column. Order is preserved. |
unmappedPlaceholder | string | no | "*** TBC ***" | Value emitted by field mappings with unmapped: true. Used during iterative mapping so draft pipelines run end-to-end before source fields are identified. |
target
Section titled “target”Where the data goes. See Target Adapters for adapter-specific config.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
adapter | bc | ifs | bluecherry | csv | pg | rest | yes | — | Target adapter id. Built-in: csv, pg. Paid add-ons: bc, ifs, bluecherry, rest. |
output | string | no | — | Output file path for file-based targets (csv, ifs, bluecherry). |
entity | string | no | — | Logical entity name. ERP adapters use this to resolve required-column lists; OData adapters use it as the entity-set name. |
connection | string | no | — | Connection string for the pg adapter. Resolves ${ENV_VAR} at load time. |
includeHeader | boolean | no | — | Whether to write a header row. Defaults: csv true, ifs false, bluecherry true. |
columnOrder | string[] | no | — | Explicit column ordering for file-based targets. Required by ERP imports that load by position rather than by name. |
dateFormat | string | no | — | dayjs format token for date columns. Defaults: YYYY-MM-DD for csv/ifs, MM/DD/YYYY for bluecherry. |
delimiter | string | no | "," | Field delimiter for file-based targets. |
encoding | string | no | "utf-8" | Output file encoding. |
nullValue | string | no | "" | Rendered value for null/undefined cells in CSV output. |
template | string | no | — | Path to a header-only CSV template, used to override the default required-column ordering for bluecherry. The literal default selects the built-in column set. |
baseUrl | string | no | — | Base URL for the bc adapter. Resolves ${ENV_VAR}. |
company | string | no | — | Company GUID or name for the bc adapter. |
apiVersion | string | no | "v2.0" | OData API version for the bc adapter. |
onConflict | fail | upsert | ignore | no | "fail" | Behaviour when the target rejects an insert as a conflict. upsert issues a PATCH (bc) or INSERT … ON CONFLICT UPDATE (pg). |
upsertKey | string[] | no | — | Conflict key for pg upserts. Required when onConflict: upsert. |
batchEndpoint | boolean | no | true | Use OData $batch for the bc adapter (max 100 ops per batch). Disable for adapters that don’t support batching. |
table | string | no | — | Target table name for the pg adapter. |
schema | string | no | "public" | Target schema for the pg adapter. |
run — execution options
Section titled “run — execution options”Every field is optional with a sensible default; you can omit the entire run: block.
| Key | Type | Required | Default | Description |
|---|---|---|---|---|
mode | full | incremental | validate-only | no | "full" | Run mode. full extracts and loads everything; incremental filters source records by incrementalField; validate-only runs DQ + transform but skips the load. |
batchSize | number | no | 500 | DuckDB insert batch size during extract and transform. |
onError | continue | stop | no | "continue" | Behaviour when a target adapter rejects a row. continue increments rowsFailed; stop aborts the load. |
logLevel | debug | info | warn | error | no | "info" | pino log level. debug adds per-row progress lines and disables the progress bar. |
dryRun | boolean | no | false | When true, skip the load phase even if a target is configured. Equivalent to passing --dry-run on the command line. |
outputDir | string | no | "./output" | Base directory for run artefacts (rejection CSV, DQ summary JSON, run state JSON, default DuckDB staging file). |
stagingDb | string | no | "" | DuckDB staging file path. Empty string defaults to {outputDir}/{pipeline.name}.duckdb. Set to :memory: to force in-memory mode. |
enrichConcurrency | number | no | 5 | Phase 4a — concurrent in-flight enrich provider requests. Consumed by @caracal-lynx/sluice-enrich if installed. |
enrichTimeoutMs | number | no | 5000 | Phase 4a — per-request timeout for enrich provider calls. Consumed by @caracal-lynx/sluice-enrich if installed. |
enrichMaxRetries | number | no | 3 | Phase 4a — max retries for enrich provider calls. Consumed by @caracal-lynx/sluice-enrich if installed. |
incrementalField | string | no | — | Source column used to filter incremental runs (must be timestamp-coercible). |
incrementalSince | string | no | — | ISO datetime override for the incremental window start. If empty, the runner reads lastRunAt from the state file. |
Worked example
Section titled “Worked example”A complete single-source pipeline migrating customers from MSSQL to IFS via the paid ifs adapter:
pipeline: name: acme-corp-customers client: acme-corp version: "1.0" entity: CustomerInfo description: Customer master — legacy SQL to IFS ERP
source: adapter: mssql connection: ${SOURCE_MSSQL} query: | SELECT c.CUST_CODE, c.CUST_NAME, c.POST_CODE, c.EMAIL, c.CREDIT_LIMIT FROM dbo.Customers c WHERE c.Active = 1 AND c.DELETED = 0
dq: stopOnCritical: true rejectionFile: ./output/acme-corp-customers-rejected.csv rules: - field: CUST_CODE checks: - { type: notNull, severity: critical } - { type: unique, severity: critical } - field: EMAIL checks: - { type: email, severity: warning }
transform: fields: - { from: CUST_CODE, to: CustomerNo, type: string, max: 20 } - { from: CUST_NAME, to: Name, type: string, cleanse: trim|titleCase } - { from: EMAIL, to: Email, type: string, cleanse: trim|lowercase } - { to: CustomerGroup, type: constant, value: DOMESTIC }
target: adapter: ifs entity: CustomerInfo output: ./output/acme-corp-customers-ifs.csv includeHeader: false columnOrder: [CustomerNo, Name, Email, CustomerGroup]
run: mode: full batchSize: 500 logLevel: info