Skip to content

Pipeline YAML Schema

This page is the definitive, searchable reference for every key in a Sluice pipeline YAML. It is auto-generated from the Zod schema in src/config/schema.ts so it cannot drift from the code.

pipeline: { ... } # identity and metadata (always required)
source: { ... } # single-source mode (mutually exclusive with sources+merge)
sources: [ ... ] # multi-source mode — minimum two entries
merge: { ... } # required when sources is present
enrich: { ... } # OPTIONAL — Phase 4a; private add-on
dq: { ... } # data quality rules
transform: { ... } # field mappings, lookups, transforms
target: { ... } # load destination
run: { ... } # execution options (defaults provided)
KeyTypeRequiredDefaultDescription
namestring (lowercase, hyphenated)yesPipeline slug used in output filenames and log records. Must match ^[a-z0-9-]+$.
clientstringyesClient identifier; appears in load reports and run state.
versionstringyesPipeline version (quote in YAML to keep it a string).
entitystringyesLogical entity name (used in load reports and target adapter metadata).
descriptionstringnoHuman-readable description.

Where the data comes from. Exactly one of query, file, or endpoint must be set. See Source Adapters for adapter-specific config.

KeyTypeRequiredDefaultDescription
adaptermssql | pg | csv | xlsx | rest | odoo-csvyesSource adapter id. One of mssql, pg, csv, xlsx, rest, odoo-csv.
connectionstringnoConnection string for SQL adapters. Resolves ${ENV_VAR} tokens at load time. Required for mssql and pg.
querystringnoSELECT statement for SQL adapters. Required for mssql and pg.
filestringnoPath or glob for file adapters. Required for csv, xlsx, and odoo-csv.
endpointstringnoFull URL for the rest adapter. Required for rest.
headersRecord<string, string>noOptional HTTP headers, applied to every request from the rest adapter.
delimiterstringno","Field delimiter for the csv and odoo-csv adapters.
encodingstringno"utf-8"File encoding for csv, xlsx, and odoo-csv (any Node-supported encoding).
sheetstring | numbernoSheet name or 0-based index for the xlsx adapter.
paginationobjectnoPagination config for the rest adapter. Omit for single-page responses.
pivotobjectnoUsed by the odoo-csv adapter only. Declares one column whose Key: value cells should be pivoted into new columns named after the keys.

In multi-source mode, replace source: with a sources: array (minimum two entries). Each entry inherits the single-source source keys above plus the three multi-source-only keys below: id, priority, and rename. Mutually exclusive with source.

KeyTypeRequiredDefaultDescription
adaptermssql | pg | csv | xlsx | rest | odoo-csvyesSource adapter id. One of mssql, pg, csv, xlsx, rest, odoo-csv.
connectionstringnoConnection string for SQL adapters. Resolves ${ENV_VAR} tokens at load time. Required for mssql and pg.
querystringnoSELECT statement for SQL adapters. Required for mssql and pg.
filestringnoPath or glob for file adapters. Required for csv, xlsx, and odoo-csv.
endpointstringnoFull URL for the rest adapter. Required for rest.
headersRecord<string, string>noOptional HTTP headers, applied to every request from the rest adapter.
delimiterstringno","Field delimiter for the csv and odoo-csv adapters.
encodingstringno"utf-8"File encoding for csv, xlsx, and odoo-csv (any Node-supported encoding).
sheetstring | numbernoSheet name or 0-based index for the xlsx adapter.
paginationobjectnoPagination config for the rest adapter. Omit for single-page responses.
pivotobjectnoUsed by the odoo-csv adapter only. Declares one column whose Key: value cells should be pivoted into new columns named after the keys.
idstringyesSource id — lowercase alphanumeric with hyphens only. Must be unique across the array; used as the staging table suffix (stg_raw_{id}).
prioritynumberyesPositive integer priority. Lower priority = higher precedence in coalesce and priority-override strategies.
renameRecord<string, string>noOptional rename map { "old column": "new column" } applied in-place after extract. Useful for harmonising CSV/XLSX column headers; SQL/REST sources should rename in the query instead.

Required whenever sources is set. See How It Works for the merge flow diagram.

KeyTypeRequiredDefaultDescription
keystring | string[]yesMerge key — single column name or array of columns (composite key). Must exist in every source after rename is applied.
strategycoalesce | priority-override | union | intersectno"coalesce"Merge strategy. coalesce = first non-null wins (priority-ordered); priority-override = highest-priority source always wins; union = all rows deduped by key; intersect = rows present in every source.
onUnmatchedinclude | exclude | warn | errorno"include"What to do with rows present in fewer than all sources. Ignored by intersect.
fieldStrategiesobject[]no[]Per-field overrides of the top-level strategy. Use to pin a specific field to a specific source, or to override the strategy on a single field.
conflictLogstringnoOptional CSV path to log per-conflict detail (key, field, winning source, source values). Only written when at least one conflict is detected.
incrementalSourcestringnoSource id used for incremental filtering. Required when run.mode: incremental. Other sources run full each time.

See Data Quality Rules for the full check-type reference.

KeyTypeRequiredDefaultDescription
rulesFilestringnoOptional path to a composite rule library YAML (Tier 1 plugins). Composite-rule references in rules[] are expanded into built-in checks before Zod validation.
stopOnCriticalbooleannotrueWhen true, the pipeline halts (exit code 2) if any critical check fails. When false, critical violations are logged but the run continues.
rejectionFilestringnoPath for the per-violation rejection CSV. Defaults to {outputDir}/{pipeline.name}-rejected.csv.
rulesobject[]no[]List of DQ rules. Each entry binds a field to one or more checks.

Optional. Runs between Extract (or Merge) and DQ. The framework that consumes this block lives in the private @caracal-lynx/sluice-enrich package; with that not installed, an enrich: block is parsed and validated but the phase is skipped with a WARN log.

KeyTypeRequiredDefaultDescription
cacheboolean | "persist"notrueCache strategy for enrich providers. true = in-memory cache; false = no cache; "persist" = on-disk cache between runs.
onErrorflag | skip | failno"flag"Pipeline-wide error policy when an enrich call fails. flag = mark row but keep it; skip = drop the row; fail = halt the pipeline.
lookupsobject[]yesAt least one enrich lookup. Each binds a source field to a provider (e.g. VIES, HMRC) and writes the result into named columns.

Field mappings, lookups, transforms, and cleanse ops. See Transforms for every available type.

KeyTypeRequiredDefaultDescription
lookupsobject[]no[]Named lookup tables, loaded once at the start of the transform phase and cached in memory. Any source adapter (CSV, MSSQL, REST, …) is valid here.
fieldsobject[]yesField mappings — one entry per output column. Order is preserved.
unmappedPlaceholderstringno"*** TBC ***"Value emitted by field mappings with unmapped: true. Used during iterative mapping so draft pipelines run end-to-end before source fields are identified.

Where the data goes. See Target Adapters for adapter-specific config.

KeyTypeRequiredDefaultDescription
adapterbc | ifs | bluecherry | csv | pg | restyesTarget adapter id. Built-in: csv, pg. Paid add-ons: bc, ifs, bluecherry, rest.
outputstringnoOutput file path for file-based targets (csv, ifs, bluecherry).
entitystringnoLogical entity name. ERP adapters use this to resolve required-column lists; OData adapters use it as the entity-set name.
connectionstringnoConnection string for the pg adapter. Resolves ${ENV_VAR} at load time.
includeHeaderbooleannoWhether to write a header row. Defaults: csv true, ifs false, bluecherry true.
columnOrderstring[]noExplicit column ordering for file-based targets. Required by ERP imports that load by position rather than by name.
dateFormatstringnodayjs format token for date columns. Defaults: YYYY-MM-DD for csv/ifs, MM/DD/YYYY for bluecherry.
delimiterstringno","Field delimiter for file-based targets.
encodingstringno"utf-8"Output file encoding.
nullValuestringno""Rendered value for null/undefined cells in CSV output.
templatestringnoPath to a header-only CSV template, used to override the default required-column ordering for bluecherry. The literal default selects the built-in column set.
baseUrlstringnoBase URL for the bc adapter. Resolves ${ENV_VAR}.
companystringnoCompany GUID or name for the bc adapter.
apiVersionstringno"v2.0"OData API version for the bc adapter.
onConflictfail | upsert | ignoreno"fail"Behaviour when the target rejects an insert as a conflict. upsert issues a PATCH (bc) or INSERT … ON CONFLICT UPDATE (pg).
upsertKeystring[]noConflict key for pg upserts. Required when onConflict: upsert.
batchEndpointbooleannotrueUse OData $batch for the bc adapter (max 100 ops per batch). Disable for adapters that don’t support batching.
tablestringnoTarget table name for the pg adapter.
schemastringno"public"Target schema for the pg adapter.

Every field is optional with a sensible default; you can omit the entire run: block.

KeyTypeRequiredDefaultDescription
modefull | incremental | validate-onlyno"full"Run mode. full extracts and loads everything; incremental filters source records by incrementalField; validate-only runs DQ + transform but skips the load.
batchSizenumberno500DuckDB insert batch size during extract and transform.
onErrorcontinue | stopno"continue"Behaviour when a target adapter rejects a row. continue increments rowsFailed; stop aborts the load.
logLeveldebug | info | warn | errorno"info"pino log level. debug adds per-row progress lines and disables the progress bar.
dryRunbooleannofalseWhen true, skip the load phase even if a target is configured. Equivalent to passing --dry-run on the command line.
outputDirstringno"./output"Base directory for run artefacts (rejection CSV, DQ summary JSON, run state JSON, default DuckDB staging file).
stagingDbstringno""DuckDB staging file path. Empty string defaults to {outputDir}/{pipeline.name}.duckdb. Set to :memory: to force in-memory mode.
enrichConcurrencynumberno5Phase 4a — concurrent in-flight enrich provider requests. Consumed by @caracal-lynx/sluice-enrich if installed.
enrichTimeoutMsnumberno5000Phase 4a — per-request timeout for enrich provider calls. Consumed by @caracal-lynx/sluice-enrich if installed.
enrichMaxRetriesnumberno3Phase 4a — max retries for enrich provider calls. Consumed by @caracal-lynx/sluice-enrich if installed.
incrementalFieldstringnoSource column used to filter incremental runs (must be timestamp-coercible).
incrementalSincestringnoISO datetime override for the incremental window start. If empty, the runner reads lastRunAt from the state file.

A complete single-source pipeline migrating customers from MSSQL to IFS via the paid ifs adapter:

pipeline:
name: acme-corp-customers
client: acme-corp
version: "1.0"
entity: CustomerInfo
description: Customer master — legacy SQL to IFS ERP
source:
adapter: mssql
connection: ${SOURCE_MSSQL}
query: |
SELECT c.CUST_CODE, c.CUST_NAME, c.POST_CODE, c.EMAIL, c.CREDIT_LIMIT
FROM dbo.Customers c
WHERE c.Active = 1 AND c.DELETED = 0
dq:
stopOnCritical: true
rejectionFile: ./output/acme-corp-customers-rejected.csv
rules:
- field: CUST_CODE
checks:
- { type: notNull, severity: critical }
- { type: unique, severity: critical }
- field: EMAIL
checks:
- { type: email, severity: warning }
transform:
fields:
- { from: CUST_CODE, to: CustomerNo, type: string, max: 20 }
- { from: CUST_NAME, to: Name, type: string, cleanse: trim|titleCase }
- { from: EMAIL, to: Email, type: string, cleanse: trim|lowercase }
- { to: CustomerGroup, type: constant, value: DOMESTIC }
target:
adapter: ifs
entity: CustomerInfo
output: ./output/acme-corp-customers-ifs.csv
includeHeader: false
columnOrder: [CustomerNo, Name, Email, CustomerGroup]
run:
mode: full
batchSize: 500
logLevel: info