Version: v2

CCG Retry Mechanism for CASI API

Overview

The CCG (Convenient Checkout Gateway) implements a robust retry mechanism for all CASI (Communications API for Sycurio Integrations) API calls to ensure reliability and resilience in the face of transient network or service errors.

Importance of Retry for CASI API

Retries play a crucial role in ensuring reliability and a smooth user experience when interacting with Sycurio System.

Network issues, temporary service outages, or unexpected server errors can cause requests to fail even if there is no problem with the user's action.

Problem with Transitional Call States

When CCG calls the CASI INSPECT endpoint, the underlying Semafone system must be in a stable state to provide accurate inspection data. However, phone calls go through several transitional states where the call is not yet stable:

Transitional Call States:

Call Ringing: The phone is ringing but not yet answered
Call Transferring: The call is being transferred to another agent or department

During these transitional states, the Semafone system within CASI cannot reliably inspect the call, resulting in an HTTP 500 error with the message: "Failed to make Semafone inspect URL call".

Behavior Without Retry:

CCG calls CASI INSPECT during a transitional state (e.g., call still ringing)
CASI returns HTTP 500: {"detail": "Failed to make Semafone inspect URL call"}
CCG treats this as a permanent failure and terminates the telephonic entry session
The entire transaction fails, requiring manual intervention or user retry

Why This Is Problematic:

These failures are temporary and transient - that will resolve itself within a short time.
Users experience unnecessary errors for conditions that would self-resolve
Support teams must manually retry sessions or ask customers to restart the process
The success rate is artificially lowered due to timing issues.

Without a retry mechanism, these transient errors would cause CCG to terminate the session prematurely, resulting in failed transactions and a poor user experience.

Problem with 504 Gateway Timeout Error

When CCG calls the CASI API, there are scenarios where the request may time out due to temporary network issues or delays in the downstream systems. In such cases, CASI returns an HTTP 504 Gateway Timeout error.

Behavior Without Retry:

CCG calls a CASI endpoint and encounters a network delay or temporary unavailability
CASI returns HTTP 504 Gateway Timeout error
CCG treats this as a permanent failure and terminates the telephonic entry session
The transaction fails, requiring manual intervention or user retry

Why This Is Problematic:

These failures are often temporary and can resolve themselves if retried after a short interval
Users may experience unnecessary errors and interruptions
Support teams may need to manually retry or assist users, increasing operational overhead
The overall success rate is reduced due to transient network or system issues

Without a retry mechanism, temporary network or service disruptions can cause avoidable failures, negatively impacting both user experience and operational efficiency.

When Retries Occur

Network Timeouts: If a request to a CASI API endpoint times out (e.g., HTTP 504 Gateway Timeout), CCG will automatically retry the request.
Server Errors: For retriable server-side errors (HTTP 5xx) CCG will retry the request up to a configured number of attempts.
No Retries: For non-retriable server-side errors (HTTP 5xx) & client-side errors (HTTP 4xx ), CCG does not retry as these indicate issues that must be resolved by the caller.

Example Retry Flow

CCG sends a request to a CASI API endpoint.
If the response is a network timeout or HTTP 5xx, CCG waits for a short delay and retries.
If the retry also fails, CCG waits longer (exponential backoff) and retries again.
If all retries fail, the error is logged and surfaced to the user or calling system.

Retry Strategy

Triggering Retries: If CCG encounters an error that is marked as retriable in the Retriability table, it will automatically trigger a retry according to the configured strategy.
Max Attempts: Typically 4 attempts (1 initial + 3 retries), but this can be changed as needed.
Delay Between Retries: After a failed attempt, CCG waits a short time before trying again. This waiting period usually starts at 600 milliseconds, but can be adjusted.
Backoff: The waiting time increases with each retry (Exponential Backoff) to avoid overwhelming the service.
Logging: All retry attempts and failures are recorded for monitoring and troubleshooting.

Impact of Retry

1. Positive Impact

Improved User Experience: Users are less likely to encounter failures due to temporary call state issues, resulting in smoother telephonic payment sessions.
Higher Success Rate: Telephonic entry sessions are more likely to complete successfully, reducing the need for manual intervention or user retries.
Operational Efficiency: Support teams spend less time addressing transient failures, allowing them to focus on genuine issues.
System Resilience: The system gracefully handles temporary network or telephony issues, increasing overall reliability.

2. Negative Impact

Increase in 500 Error Count: If an INSPECT call is made after a REMOVE call, a 500 error with { "detail": "Failed to make Semafone inspect URL call" } may occur due to a known CASI-side bug (see issue).
False Error Metrics: Because the retry logic treats this specific 500 error as retriable, the number of 500 errors may appear higher in monitoring dashboards. However, these errors do not impact user experience or system performance.
Dashboard Consideration: To avoid misleading error metrics, dashboard queries should be adjusted to filter out or separately categorize these known, non-impactful 500 errors.

Conclusion

The retry mechanism in CCG for CASI API is designed to address the unique challenges of telephonic payment sessions, ensuring reliability and a seamless user experience. While it brings significant business value and operational benefits, it is important to be aware of its impact on error metrics and monitoring. Ongoing collaboration between development and monitoring teams will help ensure that dashboards accurately reflect system health, and that users continue to benefit from a robust and resilient payment process.

Overview​

Importance of Retry for CASI API​

When Retries Occur​

Example Retry Flow​

Retry Strategy​

Impact of Retry​

1. Positive Impact​

2. Negative Impact​

Conclusion​