Definition
What is a data-quality firewall?
A data-quality firewall is a governed control layer that checks operational records before they land in trusted systems. It validates facts, applies deterministic policy, escalates ambiguity, blocks risky changes, and verifies approved writebacks.
Why it matters
CRM, ERP, analytics, and AI workflows depend on records that are correct enough to trust. If bad data lands first, teams discover the damage later in reports, automations, customer journeys, invoices, or model output.
Operational data is different from warehouse data. A customer record, lead, account, order, product, invoice, or support case often triggers immediate action: a sales rep gets routed, an ERP sync starts, a quote is prepared, a shipment is planned, or an AI assistant recommends the next step. When the record is wrong, the business does not merely get a bad dashboard. It may create duplicate work, contact the wrong customer, block an invoice, confuse ownership, or push bad context into an agentic workflow.
A firewall pattern is useful because the control point moves closer to where damage begins. Instead of waiting for a report to reveal that bad records already spread, a governed path checks records before they become trusted operational truth.
Common failure modes
A lead, contact, account, or customer resembles an existing record but cannot be merged safely without evidence.
Missing domain, malformed email, invalid phone, bad country code, impossible date, or placeholder values.
Imports, enrichment tools, partner feeds, and manual edits disagree about owner, status, industry, or customer identity.
A copilot, workflow agent, or RAG process reads stale or unverified CRM and ERP context as if it were trusted.
How it works
Check required fields, formats, enums, duplicates, and relationship constraints.
Use deterministic policy first; escalate ambiguous identity, enrichment, or conflict cases to AI or human review.
Only approved execution plans reach target systems, and each write is verified after apply.
Every decision produces a receipt: policy, actor, timestamp, evidence, and result.
Example
A web form submits a lead with inconsistent casing and a possible duplicate account. Refinery can normalize safe fields automatically, route the duplicate merge to review, write only the approved update, and verify the CRM target state.
One governed record, end to end
Input: Acme BV, JOHN@ACME.COM, possible duplicate of ACME International B.V., missing industry, unverified source.
Detected issues: email casing, possible duplicate account, missing industry, and unverified source.
Policy: email casing can be normalized automatically; duplicate merge requires review; industry enrichment requires AI judgment and approval.
Decision: email normalized, duplicate sent to review, enrichment held as a suggestion.
Writeback: only the approved deterministic update reaches the target CRM record.
Verification: target CRM record confirmed.
Receipt: policy, actor, timestamp, evidence, and result are retained for audit and operator review.
Where Refinery fits
Refinery is a data-quality firewall for governed paths between source systems and trusted destinations. It is not only a dashboard and not an unchecked AI agent. It is deterministic by default, AI by exception, and receipt-driven.
Refinery does not replace data observability, deduplication tools, master data management, or CRM validation. It sits at a practical control point: source-to-target paths where bad records need to be checked before they land. Deterministic policy handles clear cases. AI is reserved for ambiguity. Humans review risk. Writebacks are verified after execution.
How it differs from observability, dedupe, and governance
Data observability is valuable for monitoring freshness, lineage, pipeline health, anomalies, and warehouse reliability. It usually tells teams something went wrong after data moved. A data-quality firewall is pre-landing control for operational records.
Dedupe tools help detect or merge duplicate records. A firewall can use duplicate suspicion as one signal, but it also governs required fields, formats, enrichment, ownership, writeback eligibility, review state, and verification.
Data governance defines policies, ownership, and standards. A data-quality firewall operationalizes those policies in live paths so individual records can be accepted, fixed, blocked, escalated, or verified.
What the 14-day baseline measures
- How many records would have reached the target with missing, invalid, duplicate, stale, or conflicting fields.
- Which issues are deterministic and can be fixed safely under policy.
- Which records need AI judgment, human review, or no action.
- Which paths create the highest operational risk before CRM, ERP, analytics, WMS, or AI.
- What evidence and receipt would exist for each decision.
What a receipt contains
A receipt is the proof that a governed record was handled deliberately. It should include the policy version, detected issue, decision, actor or system, timestamp, evidence, writeback status, target verification result, and final outcome. That receipt is what turns data-quality work from a hidden cleanup process into an inspectable operating control.
FAQ
What is a data-quality firewall?
A data-quality firewall is a control layer that checks operational records before they land in CRM, ERP, analytics, or AI systems. It can validate, repair, block, escalate, and verify records at the path level.
How is a data-quality firewall different from data observability?
Data observability detects issues after they appear in pipelines or reports. A data-quality firewall acts before risky records land in production systems.
Does Refinery replace my CRM, ERP, or data warehouse?
No. Refinery sits between source systems and targets. It governs the records moving through those paths instead of replacing the systems of record.
Can Refinery write back to production systems?
Refinery is designed around policy-gated writeback and verification. Teams should start with read-only or shadow mode, then enable production writeback only for approved paths.
How does AI fit into Refinery?
AI is used by exception for ambiguous judgment, enrichment, or context evaluation. It is not sovereign. Production changes still need policy, confidence, review where required, and verification.
What is a governed path?
A governed path is a source-to-target business flow protected by validation, policy, review, writeback rules, and receipts.
What happens when Refinery is not confident?
Low-confidence or risky decisions are routed to human review with evidence instead of being silently written to the target.
Related pages
Baseline offer
Get a 14-day shadow-mode baseline to measure which bad records would reach one governed path before production writeback.