Issues with Management Server controlling Production Client Cluster in Enfield
Incident Report for Digital Craftsmen
Resolved
Good Afternoon,

The work last night to move the final batch of machines to the new cluster along with the hosts has now been completed successfully.
Our monitoring is reporting everything is healthy as well, if however you notice anything out of place please create a ticket with us by emailing support@digitalcraftsmen.com

Kind Regards
Rhodri Metcalfe-Davies
Posted Aug 30, 2019 - 15:14 BST
Update
Good Afternoon,

As you are aware, two weeks ago we had an issue with the management server for one of our VMware clusters and have been gradually moving machines over to a new cluster under a new management server.
We will be making the final changes from this incident this Thursday 29th August between 8pm and Midnight.
This may require some downtime and affected clients will be updated individually with a list of their final machines to be moved.

Once again, we apologise for the inconvenience that this may have caused and we also want to assure you that work has been put in place to stop any further occurrences of this happening again.

Kind Regards
Rhodri Metcalfe-Davies
Posted Aug 27, 2019 - 17:22 BST
Update
Hi,

We have moved most client machines to the new management cluster. However there are still a number to move. So that we don't interfere with backup processes and other services we will reschedule the remainder and choose appropriate times to do the work in discussion with the client where appropriate.

regards,

Paul
Posted Aug 16, 2019 - 00:21 BST
Monitoring
All clients in Enfield may experience some service slowdowns and very short periods of downtime over the next couple of hours while we work to resolve this issue.

We will be monitoring closely for affected services and apologise for the inconvenience caused.

regards,

Paul Orrock
Technical Director
Posted Aug 15, 2019 - 21:20 BST
Update
Unfortunately we've been unable to resolve the issues without requiring downtime.

We are currently contacting all affected clients to schedule a few short periods of downtime to recover proper administrative control of each virtual machine.

We apologise for the inconvenience caused.

Regards,

Simon Wilcox
Managing Director
Posted Aug 15, 2019 - 17:28 BST
Identified
Hi,

We are continuing to have issues with our management server and some other issues related to it. We're endeavoring to find a fix in conjunction with VMware and will post further updates as we know more.

There is a chance we will have to schedule an emergency downtime period to reboot all the VMs in Enfield. If this happens we expect downtime to be no more than a couple of minutes and will try and do it in as controlled a manner as possible. Further updates to follow.

regards,

Paul Orrock
Technical Director
Posted Aug 15, 2019 - 09:48 BST
Investigating
Hi,

We are currently experiencing issues with the controlling management server that runs our Enfield client cluster. Client virtual machines continue to run unaffected and in the event of a host failure, they should restart on another host in the cluster. We are currently trying to recover the management server with assistance from vmware support.

There should currently be no downtime for client machines but we regard this as an at risk notification because we are restricted in the actions we can take on the cluster.

Further updates will follow.

regards,

Paul Orrock
Technical Director
Posted Aug 14, 2019 - 19:07 BST
This incident affected: Locations (Enfield Datacentre) and Infrastructure (Compute Cloud).