Host failure in LHC

Incident Report for Digital Craftsmen

Resolved

Hi,

We believe this problem to be fixed. It was due to a failure in the vmware layer. This host has been rebooted and been stable since. We will be re-introducing it to the cluster over the next couple of days. We apologise for any inconvenience caused.

regards,

Paul Orrock
Technical Director

Posted Apr 10, 2019 - 14:33 BST

Update

We are continuing to investigate this issue

All of the virtual machines that were on that host were restarted automatically on another host within a minute of the incident. We have plenty of capacity in LHC and there are no performance issues whcih is why this is now being marked as operational.

The failed host is being investigated and will not be back in the cluster until we understand the failure and are confident in the fix.

There should be no re-occuring service issues at this point. If you are seeing any problems with your machines or service, please contact support@digitalcraftsmen.com

regards,

Paul Orrock

Posted Apr 09, 2019 - 14:44 BST

Investigating

We have had one of our vmWare hosts fail in our London Hosting Centre. The virtual machines that were on that host have been auotmatically restarted on other hosts. We are investigating further.

Posted Apr 09, 2019 - 14:06 BST

This incident affected: Infrastructure (Compute Cloud) and Locations (Welwyn Garden City Datacentre).