Quickstart
This page takes you from a clean machine to a working pipeline run with data quality validation, transformation, and a rejection report — all in under ten minutes.
By the end you will have:
- A pipeline YAML that reads a CSV of customers, validates emails, normalises field casing, and writes a clean CSV.
- A rejection CSV showing every row that failed a data quality check, with the rule and severity recorded.
- A DQ summary JSON suitable for dropping into a CI artefact.
Before you start
Section titled “Before you start”Install the CLI globally if you haven’t already — see Installation. The rest of this page assumes sluice --version works in your shell.
Shell shown: PowerShell 7 on Windows. Every command also works as-is in Bash, Zsh, and Fish — Sluice does not depend on shell-specific features.
The pipeline
Section titled “The pipeline”-
Make a working directory.
Terminal window New-Item -ItemType Directory -Path sluice-quickstart -Force | Out-NullSet-Location sluice-quickstartNew-Item -ItemType Directory -Path data, output -Force | Out-Null -
Create a CSV with a few good rows and a few bad ones.
Save this as
data/customers.csv:name,email,countryAda Lovelace,ada@example.com,GBGrace Hopper,grace@example.com,USAlan Turing,alan@example.com,GBMargaret Hamilton,not-an-email,US,empty-name@example.com,GBLinus Torvalds,linus@example.com,FITwo of those rows are deliberately broken. The fourth row’s
emailis not a valid email; the fifth row has an emptyname. We’re going to ask Sluice to catch both. -
Write the pipeline YAML.
Save this as
customers.pipeline.yaml:pipeline:name: customers-quickstartclient: demoversion: "1.0"entity: Customerdescription: First-pipeline walkthrough — CSV in, clean CSV out.source:adapter: csvfile: ./data/customers.csvdq:stopOnCritical: truerejectionFile: ./output/customers-rejected.csvrules:- field: namechecks:- { type: notNull, severity: critical }- field: emailchecks:- { type: notNull, severity: critical }- { type: email, severity: warning }- field: countrychecks:- { type: allowedValues, value: [GB, US, FI, DE, FR, IE], severity: warning }transform:fields:- { from: name, to: Name, type: string, cleanse: trim|titleCase }- { from: email, to: Email, type: string, cleanse: trim|lowercase }- { from: country, to: Country, type: string, default: GB, cleanse: trim|uppercase }- { to: Source, type: constant, value: "quickstart" }target:adapter: csvoutput: ./output/customers-clean.csvincludeHeader: trueTake a moment to read the YAML. The four sections —
source,dq,transform,target— describe the whole migration. There is no other code to write. -
Validate the config.
Terminal window sluice check customers.pipeline.yamlcheckparses the YAML against the Zod schema and exits cleanly if everything is well-formed. If you mistype a key it tells you exactly where. -
Do a dry run.
Terminal window sluice validate customers.pipeline.yamlvalidateextracts, runs DQ, and applies transforms — but does not load to the target. This is the safe way to iterate on rules and mappings.You’ll see a phase-by-phase progress bar end with a coloured summary line:
✅ Extracted 6 rows⚠️ 4 passed · 2 rejected · 0 critical · 1 warningThe two rejected rows are the broken ones from step 2.
-
Run the full pipeline.
Terminal window sluice run customers.pipeline.yamlrundoes everythingvalidatedoes, then loads the clean rows to the target adapter. For our CSV target that means writingoutput/customers-clean.csv.
Inspect the output
Section titled “Inspect the output”Three files land in ./output/:
Get-ChildItem output# customers-clean.csv ← only the clean rows, with transformed columns# customers-rejected.csv ← every row that failed any DQ check# customers-quickstart-dq-summary.json ← machine-readable summary# customers-quickstart-state.json ← run state (for incremental mode)customers-clean.csv should look like this:
Name,Email,Country,SourceAda Lovelace,ada@example.com,GB,quickstartGrace Hopper,grace@example.com,US,quickstartAlan Turing,alan@example.com,GB,quickstartLinus Torvalds,linus@example.com,FI,quickstartNotice what changed:
name→Name,email→Email,country→Country(renamed byto:).Margaret Hamilton’s row is missing because her email failed theemailwarning and her row was kept — but wait, it isn’t here. That’s because the warning still records a rejection incustomers-rejected.csv; onlycriticalchecks remove rows from output. Look again at the rejection file:
row_index,field,value,rule,severity,message4,email,not-an-email,email,warning,must be a valid email address5,name,,notNull,critical,must not be nullRow 5 (the empty-name row) was dropped because notNull on name is critical. Row 4 (the bad email) was kept in the output because the email rule was warning — but it was logged so you can fix the source.
That distinction — critical rejects the row, warning keeps it but flags it — is the heart of the data quality model. See Data Quality Rules for the complete reference.
What just happened?
Section titled “What just happened?”Sluice ran six phases:
- Config load — parsed and validated the YAML against the Zod schema.
- Extract — read
data/customers.csvinto an embedded DuckDB staging table calledstg_raw. - DQ — ran every rule against
stg_rawand wrote the rejection CSV. - Transform — applied your
cleanseops,defaults, andconstantto producestg_transformed. - Load — wrote
stg_transformedto the target CSV. - Run state — wrote
customers-quickstart-state.jsonso the next run can resume incrementally if you want.
Want to see the diagram of how the phases fit together? Read How It Works.
Where to go next
Section titled “Where to go next”- Add more checks. Data Quality Rules lists every built-in rule with examples —
unique,pattern,ukPostcode,min/max,maxLength,allowedValues. - Connect a real source. Source Adapters covers MSSQL, PostgreSQL, XLSX, and REST.
- Map to an ERP target. Target Adapters covers IFS, Business Central, BlueCherry, generic CSV, and PostgreSQL.
- Write your first non-trivial pipeline. Writing a Pipeline YAML walks through one end-to-end.
- Run it in CI. CI/CD Integration shows the GitHub Actions pattern.