Network connectivity issues

Incident Report for Abion

Postmortem

1. Incident Summary

Between 11:41:00 UTC+2 and 12:03:00 UTC+2 on 2025-05-13, all traffic that traverses our on-prem firewall experienced severe to complete packet drops. The disruption stemmed from a previously unknown software defect in the firewall OS. The defect also caused fail-over to secondary firewall to break. Service was fully restored after a automated reboot of primary firewall.

  • Duration: 22 minutes total impact
  • Affected services: Web Hosting, Virtual Server Platform, URL Redirector, Abion Core, Hosted Exchange, Standard Email (IMAP)
  • Customer impact: Websites and servers unreachable, SMTP delivery delays up to 20 min

2. Detailed Timeline (UTC+2)

Time Event
11:41 Person on-call receiving multiple DOWN alerts
11:50 Ticket raised with NOC, no upstream issues found
12:03 Network restores automatically as firewall reboots
12:05 Root cause analysis begins
12:45 An acknowledged software defect in our firewall identified as root cause
14:20 Scheduled maintenance planned for fireware upgrade, subscribers notified

3. Impact

  • Services affected: All on-prem platforms (Web Hosting, VPS, Abion Core, Hosted Exchange, Standard Email/IMAP, URL Redirector).
  • Customer experience: Complete service outage; e-mail queuing/delays.

4. Root Cause

A verified firmware defect in our firewall OS triggers an out-of-memory crash during normal traffic, simultaneously breaking HA fail-over. Vender documents issue is fixed in a later release.

5. Detection & Response

  • Detection: Multiple Tech Department staff simultaneously observed loss of connectivity.
  • Immediate response: Raised ticket with ISP; began on-prem investigation; reviewed firewall logs revealing repeated OOM events.

6. Resolution

Service returned when the primary firewall auto-rebooted (12:03). Incident completed by 12:44 after correlating log entries with vendor advisory.

7. Preventive / Corrective Actions

We’re upgrading both our firewalls next maintenance window due 29 May 2025.

Posted May 13, 2025 - 15:16 CEST

Resolved

The incident is resolved
Posted May 13, 2025 - 12:31 CEST

Monitoring

Upstream provider confirmed no issues on their end. Investigation continues.
Posted May 13, 2025 - 12:09 CEST

Update

Network connectivity has returned to normal. We're investigating root cause.
Posted May 13, 2025 - 12:03 CEST

Investigating

We are currently experiencing connectivity issues. This outage is affecting all on-prem services.
Posted May 13, 2025 - 11:55 CEST