Restarts in proxy layer causing increase of errors and slowing down logs ingestion

Incident Report for ESS (Public)

Resolved

The incident is resolved.
Posted Jun 27, 2019 - 17:34 UTC

Monitoring

The ingest delays in eu-west-1 have returned to normal. We will be monitoring this incident for the next 30 minutes.
Posted Jun 27, 2019 - 16:25 UTC

Update

AWS ap-southeast-1 logging delay is now resolved. We are still working on eu-west-1.
Posted Jun 27, 2019 - 15:25 UTC

Update

We have rolled back the proxy release across all regions and 5xx rates have dropped to normal in AWS eu-west-1 and AWS ap-southeast-1. We are currently working on improving the log ingestion rates in those two regions. Logs are currently delayed approximately one hour.
Posted Jun 27, 2019 - 14:58 UTC

Identified

We have successfully rolled back proxy release in ap-southeast-1 region which has experienced a significant drop in 5xx errors.

We are still working on improving logs ingestion rates and rolling back proxy release in the following regions: ap-northeast-1, ap-southeast-2, sa-east-1, eu-west-1, eu-central-1, GCP us-west1 and GCP europe-west3.
Posted Jun 27, 2019 - 13:58 UTC

Investigating

We have noticed an increased rate of proxy restarts following an upgrade to our proxy layer. We are currently assessing the impact it has on our platform. Preliminary findings indicate slowdowns in logs ingestion. Regions mostly affected are: eu-west-1 and ap-southeast-1.

We have decided to perform a rollback and currently working on it. We will provide more information in the next hour.
Posted Jun 27, 2019 - 13:31 UTC