In SAP CPI / SAP Integration Suite, error handling is not an “implementation detail”. It is architecture: it defines reliability, operational cost, and the risk of impacting finance, inventory, payroll, or master data.
The distinction teams skip: retry vs reprocess
Retry is automated and happens without a human validating impact. Reprocess is intentional: someone decides to run a message again with context (and a plan to prevent duplicates).
If the team cannot explain that difference per integration, the system is not production-ready: it can run, but it cannot be sustained.
Classify failures before designing the solution
A mature retry policy starts with a simple classification:
- Transient: timeouts, throttling, temporary network/capacity issues.
- Functional: invalid data, business rules, missing authorizations, out-of-range dates.
- Dependency: downstream is degraded, under maintenance, or queues are saturated.
- Contract: payload no longer matches the schema, mandatory fields are missing, or an API changed.
Only the first category is usually a candidate for automatic retries. The rest requires controlled reprocessing or data correction.
Idempotency: the minimum condition to retry safely
Retry without idempotency is not resilience: it is duplication with a delay. In SAP integration, a single duplicate can create duplicated documents, double postings, or reconciliation noise.
The key question is not “can we retry?”. It is “what evidence do we have that a second attempt will not create a second business effect?”
| Decision | Minimum evidence |
|---|---|
| Which messages can be retried automatically | Per-iFlow list + explicit condition (HTTP status/error class) + max attempts |
| How duplicates are prevented | Business-level idempotency key + dedup at receiver or a control point in integration |
| What happens after final failure | Exception path + backlog queue/inbox + owner and operational SLA |
| How support investigates and coordinates | Consistent correlation ID + useful logs + triage runbook |
Governance checklist (practical, not paperwork)
For every production integration, validate:
- Retry contract: attempts, backoff strategy, and which errors qualify.
- Reprocess contract: who reprocesses, from where, and how double effects are avoided.
- Runbook: clear diagnostic steps, including “what NOT to do” under pressure.
- Minimum observability: correlation traceability, actionable error context, and payload evidence (without exposing sensitive data).
- Segregation: who can deploy, who can reprocess, and how actions are audited.
This does not replace SAP lifecycle tooling (ALM/transport/runtime monitoring). It reduces operational chaos within the realistic scope of middleware governance.
When this becomes an architecture issue
If the team “solves” incidents by blind retries, the problem is no longer an iFlow. It is integration design and governance: idempotency, data ownership, and a support process not aligned with risk.