New - Host failure in Enfield - Client machines rebooted

Incident Report for Digital Craftsmen

Resolved

This incident has been resolved.
Posted Aug 24, 2019 - 13:01 BST

Update

Good Afternoon,

The host has now been stable for over 24 hours and all tests have passed along with being placed back into the cluster.

Kind Regards
Rhodri Metcalfe-Davies
Posted Aug 24, 2019 - 13:01 BST

Monitoring

Hi,

The physical host had stopped at the underlying vmware system and required rebooting to recover it. We're running a series on tests on it to ensure that it is fully operational before returning it to the cluster.

All client services were recovered within a few minutes of the failure. We apologise for this outage and will ensure that any learnings from the tests are implemented.

regards

Paul Orrock
Technical Director.
Posted Aug 23, 2019 - 12:48 BST

Investigating

Hi,

We have had a host fail in Enfield. The client virtual machines that were on it have been automatically restarted by the cluster manager on another host. Clients may have noticed a short period of downtime while the high availability service actioned the restarts. All client machines should now be back up.

Further updates will follow.

regards,

Paul Orrock
Posted Aug 23, 2019 - 11:14 BST
This incident affected: Locations (Enfield Datacentre) and Infrastructure (Compute Cloud).