Voice Registration Failures

Incident Report for Clearfly

Postmortem

Reason for Outage (RFO) - cfly.co DNS Resolution Failure

‌

Date of Incident: February 17, 2026

Duration: Approximately 03:00 MST to 05:16 MST (primary impact ~2 hours 16 minutes)

Affected Services: SIP registrations (via SBCs), Clearfly Portal API/management functions

‌

Summary:

On February 17, 2026, at approximately 03:00 MST, SIP registrations began failing due to internal DNS resolvers' inability to resolve cfly.co records, which are used within the core voice infrastructure. This was caused by a temporary upstream routing/peering issue impacting reachability to .co TLD anycast root servers (from ~01:37 MST to ~05:14 MST), leading to expired glue record resolution failures despite collocated authoritative servers. Services were restored via workarounds by 04:00–05:16 MST, with permanent preventive configurations implemented and verified.

‌

Impact

SIP registrations and static peering failed at increasing rates throughout the outage window.
Secondary impact: Portal API failures after cache clear during troubleshooting (~04:45–05:16 MST).
No evidence of a widespread .co TLD outage; the issue was path-specific to our upstream carrier/peering.

‌

Timeline (all times in MST on 2026.02.17):

03:00 - Monitoring detects initial registration failures.

03:14 - Automated anomaly alert triggered (registration count threshold).

03:15 - Engineers paged.

03:19 - Incident conference bridge activated for coordinated troubleshooting.

03:38 - Troubleshooting narrows to internal DNS recursion issues specific to cfly.co records; recursion for other domains normal; cfly.co resolves correctly on 2/3 authoritative servers.

03:58 - Temporary workaround: Added cfly.co authoritative server as direct resolver on SBCs; registrations restored.

04:45 - DNS cache flush during troubleshooting causes secondary failure: Portal API/management inaccessible (expired cfly.co records no longer cached).

04:51 - Stable workaround: Added cfly.co as authoritative domain on caching resolvers; recursion bypassed for controlled zones. Root cause investigation continues.

05:16 - DNS config updated on Portal server, restoring full API functionality.

20:00 - Updated forwarding configuration phased in on internal caching servers. SBC configurations updated and DNS configurations reverted.

‌

Cause:

An upstream routing/peering issue (likely carrier-scheduled maintenance) caused one-way traffic blackholing to .co TLD anycast root servers from Clearfly’s network between ~01:37 MST and ~05:14 MST. This prevented recursive resolution of cfly.co glue records once local cache TTLs expired, even though authoritative cfly.co servers are co-located in our data centers. The dependency on external recursion for our own domain's glue records resulted in unnecessary exposure to external disruptions.

‌

Preventive Actions (Implemented & Verified)

Configured domain-specific forwarders on all internal caching resolvers for authoritative domains under our control (e.g. cfly.co). Queries now forward directly to our authoritative servers, bypassing Internet recursion and external TLD/root dependencies.
- Verified in production: Queries return authoritative data without upstream reliance.
Updated SBC behavior to retain/use cached DNS records beyond TTL expiration during resolver failures.
- Cached records will be retained if queried within a grace period after TTL expiration.
- These changes minimize recurrence risk from similar upstream outages.

‌

Status

Incident resolved. No ongoing impact.

Posted Feb 18, 2026 - 13:47 MST

Resolved

The root cause for this issue was determined to be reachability issues to the .co TLD root servers. A full RFO will be provided to customers in coming days.

Posted Feb 18, 2026 - 08:24 MST

Monitoring

Registrations appear to be normal and this time and we are monitoring

Posted Feb 17, 2026 - 05:28 MST

Investigating

We are currently seeing an issue with dropping registrations.

Investigating

Posted Feb 17, 2026 - 03:26 MST

This incident affected: Voice Network.