Disclaimer: This is a user generated content submitted by a member of the WriteUpCafe Community. The views and writings here reflect that of the author and not of WriteUpCafe. If you have any complaints regarding this post kindly report it to us.

Latency, traffic, errors, and saturation are the four golden signals of monitoring. Focus on these four metrics if you can only measure four for your user-facing system.

The four golden pillars of SRE are:

  1. Service level objectives (SLOs) – Quantitative and objective metrics that define successful service levels.
  2. Monitoring – Collecting metrics about the service to understand how it's performing.
  3. Emergency response – Having processes in place to respond to incidents when they occur to minimize customer impact.
  4. Change management – Processes for planning, testing and deploying code or configuration changes to ensure high quality and minimize risk.

These four pillars form the foundation of Site Reliability Engineering (SRE) which aims to build and operate large-scale, highly reliable systems. The pillars help ensure services are available, reliable and meet performance expectations through proactive monitoring and management as well as reactive incident response. For more information about SRE pillars visit us!



Welcome to WriteUpCafe Community

Join our community to engage with fellow bloggers and increase the visibility of your blog.
Join WriteUpCafe