Stability issues in Azure West US2

Incident Report for ESS (Public)

Resolved

This incident has been resolved.
Posted Apr 09, 2021 - 10:23 UTC

Monitoring

We have mitigated the issue and we will be monitoring the platform.
Posted Apr 09, 2021 - 08:28 UTC

Update

Engineers are still working on the mitigation/resolution for this incident.

We'll provide another update in 2 hours.
Posted Apr 09, 2021 - 08:07 UTC

Update

Engineers are still working on the mitigation/resolution for this incident.

We'll provide another update in 2 hours.
Posted Apr 09, 2021 - 06:02 UTC

Update

We're getting to the tail end of the mitigation/resolution for this incident.

We'll provide another update in 2 hours.
Posted Apr 09, 2021 - 03:21 UTC

Update

Marking ES/Kibana connectivity in us-west1 & azure-westeurope as operational.
Posted Apr 09, 2021 - 01:07 UTC

Update

Still making good progress on the mitigation/resolution for this incident.

We'll provide another update in 2 hours.
Posted Apr 09, 2021 - 01:05 UTC

Update

We are making progress on the mitigation/resolution for this incident.

We'll provide another update in 2 hours.
Posted Apr 08, 2021 - 23:19 UTC

Update

We are continuing to work through the mitigation/resolution for this incident.

We'll provide another update in 2 hours.
Posted Apr 08, 2021 - 21:16 UTC

Update

Our new approach as referenced in the previous update seems to be working well and we're continuing to make progress on mitigating this incident.

We will have another update in 2 hours.
Posted Apr 08, 2021 - 18:59 UTC

Update

We are continuing to work on the mitigation for this issue. Unfortunately we have run into some problems with a few of our approaches and are currently going in a new direction for said mitigation.

We will have another update in 60 minutes.
Posted Apr 08, 2021 - 17:57 UTC

Update

We are continuing to work on mitigating the issue and are seeing health restored to more deployments as we do.

Thank you again for your patience as we work through this, we'll have another update in 60 minutes.
Posted Apr 08, 2021 - 16:31 UTC

Update

Mitigation efforts are ongoing, and making progress.
We thank you for you patience as we work to restore full service in the affected regions.

Updates will continue in 30mins
Posted Apr 08, 2021 - 15:59 UTC

Update

Mitigation efforts are on-going in the affected regions.
Engineers are working hard to restore service and ensure access to your deployment is available as soon as possible.

Updates will continue in 30mins.
Posted Apr 08, 2021 - 15:25 UTC

Identified

We have narrowed the focus of the investigation to a problem with the container network on certain hosts.
Mitigation actions have been determined, and we are working towards applying these across the affected hosts.

Furthermore, we have identified the issue in Azure Westeurope and GCP us-west-1.

Further updates will be provided within 30mins.
Posted Apr 08, 2021 - 14:45 UTC

Investigating

We are investigating reports of some issues in our Azure West US2 region.
Access to customer clusters may be impacted, and we working to determine and mitigate the problem.

Futher updates will follow within 30mins.
Posted Apr 08, 2021 - 14:16 UTC
This incident affected: Azure Washington (azure-westus2) (Elasticsearch connectivity: Azure azure-westus2, Kibana connectivity: Azure azure-westus2), Azure Netherlands (azure-westeurope) (Elasticsearch connectivity: Azure azure-westeurope, Kibana connectivity: Azure azure-westeurope), and GCP Oregon (us-west1) (Elasticsearch connectivity: GCP us-west1, Kibana connectivity: GCP us-west1).