[13:25] - Starting planned hardware maintenance on ESX01 virtualisation platform
[13:29] - NetOps team notices some VM are out (services are unstable)
[13:34] - NetOps team confirms all VM are up & running on previously upgraded ESX02 (cf. #20190115A) but GUI are still out (diagnosis in progress on ESX02)
[13:38] - NetOps/IS&T teams escalate to Commissioning subcontractor regarding a potential network issue
[13:47] - ESX01 back online; GUI are still out
[14:01] - NetOps prepares the DNS switchover (from site B to site A) but does not fire it
[14:11] - Commissioning subcontractor acknowledges a network issue: starting network debug with on-site actions (IS&T)
[14:17] - Network issue confirmed: need to test the whole wiring
[14:29] - Network issue solved but GUI are still out
[14:33] - DNS switch triggered
[14:40] - Rebooting ESX02
[14:46] - VM are partially back on ESX01 and still booting up
[14:46] - ESX02 is up & running: load-balancing VM between hosts
[15:13] - Services are up & running ; testing the network failover
[15:20] - Incident closed. DNS switchback will be initiated in another three hours.
The incident was due to a wiring fault; labels were missing on network cables (for traffic and synchronisation between VM), rendering human error during hardware maintenance more probable. Rework of the entire network and power wiring is going to be planned. First, an audit will be performed, by the end of this month (2019-01). Depending on the results, several on-site maintenance windows will follow.