Network connectivity to datacentres
Incident Report for Digital Craftsmen
Resolved
The provider of our intersite links has confirmed that a configuration error was responsible for the introduction of a single point of failure in what should be geographically diverse routes. This error has now been corrected and such a failure cannot happen again.

The links have been up and stable now for over 24 hours and we will mark this incident as resolved.

We are still conducting our internal review to further improve our response capabilities and a formal RFO will be sent to customers within the next 7 days.

Once again I apologise for the outage and the inconvenience caused.

Simon Wilcox
Posted Feb 25, 2022 - 15:23 GMT
Monitoring
Both intersite links recovered at just before 3pm and have been stable for the last 35 minutes, and we have therefore cancelled the DR invocation.

However, we are waiting on a report from our provider and pending an explanation, we will be working with clients to mitigate the impact on them should the fault reoccur.

I am sorry for the extended outage and a full report will be made available in the next few days.

Simon Wilcox.
Posted Feb 24, 2022 - 15:32 GMT
Update
We have now identified that one of the intersite links between our two datacentres has failed and the second link is reporting severe packet loss. This is related to the upstream incident reported earlier and we're working with our provider to get these links restored.

Without an ETA for restoration of the intersite links we have decided to invoke our DR plan and begin recovery of our clients from one site into the other and we'll be in touch with affected clients directly regarding the restoration of services.

I am sorry for the extended nature of this outage.

The next update for this incident will be at 15:30.

Simon Wilcox.
Posted Feb 24, 2022 - 15:03 GMT
Identified
Connectivity to our production systems has mostly been restored and our external monitors are showing sites recovering.
Some customers are still experiencing connectivity issues and we are working with them to re-establish their links as soon as possible.

Simon Wilcox.
Posted Feb 24, 2022 - 13:40 GMT
Investigating
We have lost some connectivity to our datacentres which may be affecting client sites. We're currently investigating and are in contact with our network providers.

We'll update here as soon as we learn more.

Simon Wilcox
Posted Feb 24, 2022 - 12:28 GMT
This incident affected: Infrastructure (Internet connectivity, Core network, Compute Cloud, Storage Cloud, Backup) and Platform Applications (Monitoring, RMM).