Quickstart

This page takes you from a clean machine to a working pipeline run with data quality validation, transformation, and a rejection report — all in under ten minutes.

By the end you will have:

A pipeline YAML that reads a CSV of customers, validates emails, normalises field casing, and writes a clean CSV.
A rejection CSV showing every row that failed a data quality check, with the rule and severity recorded.
A DQ summary JSON suitable for dropping into a CI artefact.

Before you start

Install the CLI globally if you haven’t already — see Installation. The rest of this page assumes sluice --version works in your shell.

Shell shown: PowerShell 7 on Windows. Every command also works as-is in Bash, Zsh, and Fish — Sluice does not depend on shell-specific features.

The pipeline

Make a working directory.

New-Item -ItemType Directory -Path sluice-quickstart -Force | Out-Null
Set-Location sluice-quickstart
New-Item -ItemType Directory -Path data, output -Force | Out-Null

Create a CSV with a few good rows and a few bad ones.

Save this as data/customers.csv:
```
name,email,country
Ada Lovelace,ada@example.com,GB
Grace Hopper,grace@example.com,US
Alan Turing,alan@example.com,GB
Margaret Hamilton,not-an-email,US
,empty-name@example.com,GB
Linus Torvalds,linus@example.com,FI
```
Two of those rows are deliberately broken. The fourth row’s email is not a valid email; the fifth row has an empty name. We’re going to ask Sluice to catch both.

Write the pipeline YAML.

Save this as customers.pipeline.yaml:

pipeline:
  name: customers-quickstart
  client: demo
  version: "1.0"
  entity: Customer
  description: First-pipeline walkthrough — CSV in, clean CSV out.

source:
  adapter: csv
  file: ./data/customers.csv

dq:
  stopOnCritical: true
  rejectionFile: ./output/customers-rejected.csv
  rules:
    - field: name
      checks:
        - { type: notNull, severity: critical }
    - field: email
      checks:
        - { type: notNull, severity: critical }
        - { type: email,   severity: warning  }
    - field: country
      checks:
        - { type: allowedValues, value: [GB, US, FI, DE, FR, IE], severity: warning }

transform:
  fields:
    - { from: name,    to: Name,    type: string, cleanse: trim|titleCase }
    - { from: email,   to: Email,   type: string, cleanse: trim|lowercase }
    - { from: country, to: Country, type: string, default: GB, cleanse: trim|uppercase }
    - { to: Source,    type: constant, value: "quickstart" }

target:
  adapter: csv
  output: ./output/customers-clean.csv
  includeHeader: true

Take a moment to read the YAML. The four sections — source, dq, transform, target — describe the whole migration. There is no other code to write.

Validate the config.
Terminal window
```
sluice check customers.pipeline.yaml
```
check parses the YAML against the Zod schema and exits cleanly if everything is well-formed. If you mistype a key it tells you exactly where.
Do a dry run.
Terminal window
```
sluice validate customers.pipeline.yaml
```
validate extracts, runs DQ, and applies transforms — but does not load to the target. This is the safe way to iterate on rules and mappings.

You’ll see a phase-by-phase progress bar end with a coloured summary line:
```
✅ Extracted 6 rows
⚠️  4 passed · 2 rejected · 0 critical · 1 warning
```
The two rejected rows are the broken ones from step 2.
Run the full pipeline.
Terminal window
```
sluice run customers.pipeline.yaml
```
run does everything validate does, then loads the clean rows to the target adapter. For our CSV target that means writing output/customers-clean.csv.

Inspect the output

Three files land in ./output/:

Get-ChildItem output
# customers-clean.csv          ← only the clean rows, with transformed columns
# customers-rejected.csv       ← every row that failed any DQ check
# customers-quickstart-dq-summary.json   ← machine-readable summary
# customers-quickstart-state.json         ← run state (for incremental mode)

customers-clean.csv should look like this:

Name,Email,Country,Source
Ada Lovelace,ada@example.com,GB,quickstart
Grace Hopper,grace@example.com,US,quickstart
Alan Turing,alan@example.com,GB,quickstart
Linus Torvalds,linus@example.com,FI,quickstart

Notice what changed:

name → Name, email → Email, country → Country (renamed by to:).
Margaret Hamilton’s row is missing because her email failed the email warning and her row was kept — but wait, it isn’t here. That’s because the warning still records a rejection in customers-rejected.csv; only critical checks remove rows from output. Look again at the rejection file:

row_index,field,value,rule,severity,message
4,email,not-an-email,email,warning,must be a valid email address
5,name,,notNull,critical,must not be null

Row 5 (the empty-name row) was dropped because notNull on name is critical. Row 4 (the bad email) was kept in the output because the email rule was warning — but it was logged so you can fix the source.

That distinction — critical rejects the row, warning keeps it but flags it — is the heart of the data quality model. See Data Quality Rules for the complete reference.

What just happened?

Sluice ran six phases:

Config load — parsed and validated the YAML against the Zod schema.
Extract — read data/customers.csv into an embedded DuckDB staging table called stg_raw.
DQ — ran every rule against stg_raw and wrote the rejection CSV.
Transform — applied your cleanse ops, defaults, and constant to produce stg_transformed.
Load — wrote stg_transformed to the target CSV.
Run state — wrote customers-quickstart-state.json so the next run can resume incrementally if you want.

Want to see the diagram of how the phases fit together? Read How It Works.

Where to go next

Add more checks. Data Quality Rules lists every built-in rule with examples — unique, pattern, ukPostcode, min/max, maxLength, allowedValues.
Connect a real source. Source Adapters covers MSSQL, PostgreSQL, XLSX, and REST.
Map to an ERP target. Target Adapters covers IFS, Business Central, BlueCherry, generic CSV, and PostgreSQL.
Write your first non-trivial pipeline. Writing a Pipeline YAML walks through one end-to-end.
Run it in CI. CI/CD Integration shows the GitHub Actions pattern.