Writing a Pipeline YAML
The Quickstart gets a CSV-to-CSV pipeline running in ten minutes. This page covers the next step: writing a pipeline against a real source schema, with lookups, transforms, and an ERP-shaped target. The example is deliberately like a typical day-one migration task.
The scenario
Section titled “The scenario”You’ve been handed access to a legacy Customers table in SQL Server and asked to load it into IFS ERP. The team has done a brief discovery call; here’s what you know:
- Source:
dbo.CustomersinLegacyDB. Columns:CUST_CODE,CUST_NAME,ADDR1,ADDR2,POST_CODE,COUNTRY,EMAIL,TEL,CREDIT_LIMIT,CURRENCY,ACCT_MGR_ID,Active,DELETED. - Active rows only:
Active = 1 AND DELETED = 0. - Target: IFS
CustomerInfo, with the column order specified by the IFS import contract. No header row. - Currency translation: the legacy system uses three-letter internal codes; IFS uses ISO 4217. There’s a CSV mapping (
legacyCode,isoCode). - Account managers: legacy
ACCT_MGR_ID(numeric) maps to an IFSSalesmanCode(alphanumeric). The mapping lives in another SQL Server table. - Quality: every customer needs a
CUST_CODE, and codes must be unique. UK postcodes should passukPostcode(warning only — some clients have foreign addresses). Credit limits must be ≥ 0 and capped at 500,000 (warning).
Step 1 — Sketch the source
Section titled “Step 1 — Sketch the source”Open the source query first; everything else flows from it. You’re not extracting columns you can’t transform, so the SELECT mirrors the IFS-bound shape.
source: adapter: mssql connection: ${SOURCE_MSSQL} query: | SELECT c.CUST_CODE, c.CUST_NAME, c.ADDR1, c.ADDR2, c.POST_CODE, c.COUNTRY, c.EMAIL, c.TEL, c.CREDIT_LIMIT, c.CURRENCY, c.ACCT_MGR_ID FROM dbo.Customers c WHERE c.Active = 1 AND c.DELETED = 0${SOURCE_MSSQL} is resolved at runtime from .env — never hard-code credentials. See Source Adapters → mssql.
Step 2 — Add the DQ rules
Section titled “Step 2 — Add the DQ rules”Sketch the DQ block before the transform. It’s faster to iterate on rules with sluice validate than to discover problems after a load.
dq: stopOnCritical: true rejectionFile: ./output/acme-corp-customers-rejected.csv rules: - field: CUST_CODE checks: - { type: notNull, severity: critical } - { type: unique, severity: critical } - { type: pattern, value: "^[A-Z0-9]{3,10}$", severity: warning } - field: CUST_NAME checks: - { type: notNull, severity: critical } - { type: maxLength, value: 100, severity: warning } - field: POST_CODE checks: - { type: ukPostcode, severity: warning } - field: EMAIL checks: - { type: email, severity: warning } - field: CREDIT_LIMIT checks: - { type: min, value: 0, severity: critical } - { type: max, value: 500000, severity: warning } - field: COUNTRY checks: - { type: allowedValues, value: [GB, IE, US, DE, FR], severity: warning }The trick is to put critical only where downstream must have a clean value (the primary key, the credit limit floor). Everything else stays as warning — the row makes it through, but it lands in the rejection CSV so the source data team sees it.
Step 3 — Define the transforms and lookups
Section titled “Step 3 — Define the transforms and lookups”Two lookups: currency map and account-manager map. The currency map is small and source-controlled, so a CSV; the account-manager map is bigger and mastered in SQL Server, so a query.
transform: lookups: - name: currencyMap source: { adapter: csv, file: ./lookups/currency-codes.csv } key: legacyCode value: isoCode - name: acctMgrMap source: adapter: mssql connection: ${SOURCE_MSSQL} query: "SELECT STAFF_ID as key, IFS_USER_ID as value FROM dbo.Staff" key: key value: value
fields: - { from: CUST_CODE, to: CustomerNo, type: string, max: 20 } - { from: CUST_NAME, to: Name, type: string, max: 100, cleanse: trim|titleCase } - { from: [ADDR1, ADDR2], to: Address1, type: concat, separator: ", ", cleanse: trim } - { from: POST_CODE, to: ZipCode, type: string, cleanse: trim|uppercase } - { from: COUNTRY, to: Country, type: string, default: GB } - { from: CURRENCY, to: CurrencyCode, type: lookup, lookup: currencyMap, default: GBP } - { from: ACCT_MGR_ID, to: SalesmanCode, type: lookup, lookup: acctMgrMap, optional: true } - { from: CREDIT_LIMIT, to: CreditLimit, type: decimal, precision: 2 } - { from: EMAIL, to: Email, type: string, cleanse: trim|lowercase } - { to: CustomerGroup, type: constant, value: DOMESTIC } - { to: SearchName, type: expression, value: "row.CUST_NAME.toUpperCase().substring(0, 20)" }Three things to notice:
- Address concat — IFS expects a single
Address1string, not separate lines. - Currency lookup has
default: GBPso missing currency codes fall through gracefully. - Account-manager lookup has
optional: truebecause the legacy data has gaps; missing values becomenullrather than throwing aTransformError.
See Transforms for every available type and cleanse op.
Step 4 — Configure the target
Section titled “Step 4 — Configure the target”target: adapter: ifs entity: CustomerInfo output: ./output/acme-corp-customers-ifs.csv includeHeader: false columnOrder: - CustomerNo - Name - Address1 - ZipCode - Country - CurrencyCode - SalesmanCode - CreditLimit - Email - CustomerGroup - SearchNameThe columnOrder mirrors the IFS import contract. Get one column wrong and IFS rejects the file, so this list is the most important thing to verify before a live run. See Target Adapters → ifs.
Step 5 — Run state
Section titled “Step 5 — Run state”run: mode: full batchSize: 500 logLevel: info dryRun: falseFor your first run, set dryRun: true (or use the --dry-run flag). Sluice does everything except the load step, so you can verify the rejection CSV and the staged transform output without touching IFS.
The full pipeline
Section titled “The full pipeline”Stick the five sections together, top with a pipeline: block, and you have a complete migration:
pipeline: name: acme-corp-customers client: acme-corp version: "1.0" entity: CustomerInfo description: Customer master — legacy SQL to IFS ERP
source: { ... } # Step 1dq: { ... } # Step 2transform: { ... } # Step 3target: { ... } # Step 4run: { ... } # Step 5Iterating
Section titled “Iterating”The development loop:
# 1. Validate the YAML (no DB access)sluice check customers.pipeline.yaml
# 2. Extract + DQ + transform, no loadsluice validate customers.pipeline.yaml --dry-run
# 3. Inspect the rejection CSV and DQ summaryGet-Content output\acme-corp-customers-rejected.csv | Select-Object -First 20Get-Content output\acme-corp-customers-dq-summary.json | ConvertFrom-Json
# 4. When happy, the live runsluice run customers.pipeline.yamlCommon pitfalls
Section titled “Common pitfalls”- Forgetting
notNullon a foreign-key column. Downstream targets reject loads where required FKs are missing — catch it in DQ. - Date format assumptions. Source dates often arrive as strings without a clear format. Prefer
type: datewith an explicitformat:rather than letting dayjs guess. - Encoding. Source CSVs from European clients are often Latin-1 or Windows-1252, not UTF-8. Set
source.encodingaccordingly. uniqueand incremental mode.uniquechecks the whole extracted dataset, not the union of past runs. If you incrementally extract daily, duplicates that span days won’t be caught — handle uniqueness at the target instead.
Next steps
Section titled “Next steps”- Run it in CI: see CI/CD Integration.
- Need to consolidate from multiple sources? See the multi-source examples in Pipeline YAML Schema.
- Need a custom rule or transform? See the Plugin System guide.