Automated snapshot failures

Incident Report for Elastic Cloud (Public)

Resolved

We’ve set up an automated process to snapshot affected clusters. The permanent fix for affected clusters will be going out with our next scheduled deployment. As the impact is now remediated, we’re closing this incident. If you have any additional concerns, please reach out to your support team! Thank you for your patience!
Posted Mar 05, 2020 - 19:19 UTC

Update

We are still working on a fix for this issue. In the interim, we are manually snapshotting any affected deployments. We will update again in 4 hours.
Posted Mar 05, 2020 - 15:06 UTC

Update

We are still working on a fix for this issue. In the interim, we are manually snapshotting any affected deployments. We will update again in 4 hours.
Posted Mar 05, 2020 - 11:03 UTC

Update

We are continuing to work on a fix for this issue. In the interim, we are manually snapshotting any affected deployments. We will update again in 4 hours.
Posted Mar 05, 2020 - 07:10 UTC

Identified

We have identified the issue and are currently developing a fix. We have manually triggered a snapshot for impacted clusters so impact for this incident at this time is low. We will provide another update in 4 hours.
Posted Mar 05, 2020 - 02:12 UTC

Update

We are continuing to investigate this issue.

While the investigation is ongoing we are going to run a manual process to create snapshots for all impacted clusters.

We will have another update in approximately 4 hours.
Posted Mar 04, 2020 - 22:41 UTC

Investigating

We have identified an issue with automated snapshots that is currently impacting all regions and providers - our investigation has determined that up to 2% of deployments are being affected. We are working on a remediation for the core issue, however in the meantime if you notice your deployment(s) is(are) not taking regular/automated snapshots you can trigger them manually as a temporary measure. These manual snapshots can be initiated through your Cloud management console, as well as using the Elasticsearch API.

We apologize for the inconvenience and will have another update for you in approximately two hours.
Posted Mar 04, 2020 - 21:04 UTC
This incident affected: Azure Netherlands (azure-westeurope) (Azure Infrastructure health: azure-westeurope), GCP Mumbai (asia-south1) (GCP Infrastructure health: asia-south1), GCP Oregon (us-west1) (CSP Infrastructure Health: GCP us-west1), AWS Sydney (ap-southeast-2) (AWS Infrastructure health: ap-southeast-2), AWS N. Virginia (us-east-1) (AWS Infrastructure health: us-east-1), Azure Singapore (azure-southeastasia) (Deployment snapshots: Azure azure-southeastasia), GCP Belgium (europe-west1) (GCP Infrastructure health: europe-west1), AWS Singapore (ap-southeast-1) (Deployment snapshots: AWS ap-southeast-1), AWS N. California (us-west-1) (AWS Infrastructure health: us-west-1), GCP London (europe-west2) (GCP Infrastructure health: europe-west2), Azure Washington (azure-westus2) (Azure Infrastructure health: azure-westus2), GCP Iowa (us-central1) (GCP Infrastructure health: us-central1), GCP Frankfurt (europe-west3) (GCP Infrastructure health: europe-west3), AWS Tokyo (ap-northeast-1) (AWS Infrastructure health: ap-northeast-1), AWS São Paulo (sa-east-1) (AWS Infrastructure health: sa-east-1), GCP Montreal (northamerica-northeast1) (GCP Infrastructure health: northamerica-northeast1), Azure Tokyo (azure-japaneast) (Azure Infrastructure health: azure-japaneast), GCP Tokyo (asia-northeast1) (GCP Infrastructure health: asia-northeast1), Azure Virginia (azure-eastus2) (Azure Infrastructure health: azure-eastus2), AWS Oregon (us-west-2) (AWS Infrastructure health: us-west-2), GCP Sydney (australia-southeast1) (GCP Infrastructure health: australia-southeast1), AWS Frankfurt (eu-central-1) (AWS Infrastructure health: eu-central-1), AWS London (eu-west-2) (AWS Infrastructure health: eu-west-2), and AWS Ireland (eu-west-1) (AWS Infrastructure health: eu-west-1).