Highest Voted 'sre' Questions

4

votes

1 answer

PromQL query to calculate service uptime & downtime from a fixed date

I'm trying to build a basic SRE dashboard in order to learn Prometheus/Grafana. I want to calculate the number of hours the service has been running & the number of hours its been down since the 1st January of the current year so that I can reduce…

asked Jun 12 '21 at 18:31

user9492428

603
1
9
25

3

votes

1 answer

manage dataproc cluster access using service account and IAM roles

I am a beginner in cloud and would like to limit my dataproc cluster access to a given gcs buckets in my project. Lets says I have created a service account named as 'data-proc-service-account@my-cloud-project.iam.gserviceaccount.com' and then I…

apache-spark google-cloud-platform google-cloud-storage google-cloud-dataproc sre

asked Jul 29 '20 at 01:33

vikrant rana

4,509
6
32
72

2

votes

1 answer

conditions to check if Aerospike cluster is being idle

Assuming aerospike is running, I need some conditions through which check weather aerospike cluster is idle and not being used at all. I tried checking log files but it also logs the heartbeat, so even ifaerospike is not running it will generate…

linux aerospike sre aerospike-ce

asked Apr 29 '22 at 05:15

Sujay_ks

47
7

2

votes

1 answer

how do I measure error budget consumption for rolling windows?

I have a SLO for one application where 95% of service response times must be less than 450ms over a rolling 24 hour window. I sample once every 60 seconds. Typically my "current service level" is around 96-97%. If the service level falls below 95%…

prometheus dashboard reliability sre

asked Dec 09 '21 at 13:31

Miked

21
1

1

vote

1 answer

RBAC for Infrastructure Engineer

I feel this is a rather basic question, but somehow I'm unable to find a good answer. Recently auditors are complaining about the Role Based Access Control for our cloud set-up. My team is responsible for the Cloud infrastructure (aka Cloud…

architecture cloud devops rbac sre

asked Jun 01 '22 at 13:09

Herman

750
1
10
23

1

vote

1 answer

Can Services in GCP's Monitoring monitor endpoints?

I installed managed Anthos on a GKE cluster. Anthos Service Mesh is working and is displaying my API. Thanks to that Services that are in Monitoring automatically detect my API. This is great as it enables me to easily set SLOs and Error Budget for…

google-cloud-platform monitoring google-anthos sre

asked Mar 21 '22 at 18:13

Marcin Kulik

845
1
12
28

1

vote

1 answer

Can TTFB be affected after page load?

In case of server side rendering, we know that TTFB is the time it takes between the start of the request and the start of the response. My question is can the TTFB be affected if the page visually updates due to filters or something but is not a…

html performance newrelic sre

asked Jan 13 '22 at 19:08

user14199036

1

vote

0 answers

What and where is this class 'UniversalScalabilityLawForecast' in Micrometer library?

I'm reading 'SRE with Java Microservices'(O'reilly) "USL forecasting is a form of “derived” Meter in Micrometer and can be enabled as shown in Example 4-39. " Example 4-39. Universal scalability law forecast configuration in…

devops monitoring micrometer spring-micrometer sre

asked Jan 09 '22 at 14:46

BY-J

11
2

1

vote

0 answers

What a page and pager mean in SRE context?

I've been reading the Google SRE Book and I've found the word page and pager in multiple lines. In this context what do they mean? see link Thank you.

sre

asked May 25 '21 at 12:35

Iván Casanova

351
1
6
16

1

vote

0 answers

Is the error budget in GCP UI supposed to rise above 100%?

I have just started using SLO's in GCP and my first SLI seems to be working, but, the "error budget" field is way above 100%. All the examples I have seen online sit at 100%, whereas mine seems to float between 700.00% and above in to the thousands.…

google-cloud-platform google-cloud-monitoring sli sre

asked Dec 29 '20 at 21:11

Cameron

46
2

1

vote

1 answer

How to avoid "Positive Feedback Cycle Overload Problem"?

Sometimes while designing reliable systems, we try to make the system more reliable by adding retries in event of failure (with feedback mechanisms). And it results to potential for an overload because we may be adding more load to an already…

google-cloud-platform high-availability system-design reliability sre

asked Dec 15 '20 at 17:02

Stalin Rijal

11
3

1

vote

0 answers

SLO compliance report according to google SRE book

I want to create a SLO compliance report like Google SRE handbook indicated here : https://landing.google.com/sre/workbook/chapters/implementing-slos/#slo-compliance-report As shown in the description : the numbers in parentheses indicate the number…

performance sli sre

asked Aug 05 '19 at 18:59

zubug55

729
7
27

1

vote

1 answer

How do we measure the site availability?

To measure the availability of a web site / API, should the dependencies also be considered? For instance, assume the payment service is down; but the shopping site is still available. Here the customer is not able to complete the purchase since the…

availability sre

asked Mar 27 '19 at 03:39

programmer

249
4
12

0

votes

1 answer

docker unable to delete default network

When I start the docker-compose file all containers are working fine. Docker File: services: db: container_name: postgresql environment: POSTGRES_DB: sonar POSTGRES_PASSWORD: sonar POSTGRES_USER: sonar hostname:…

linux docker docker-compose devops sre

asked Jul 04 '23 at 06:28

Mayur Dagdi

11
1
4

0

votes

0 answers

How to put Grafana into maintenance mode?

Is there any way to put Grafana in maintenance mode? I want to show the details of the planned maintenance window in the Grafana UI for all the users. How can we do it? Where we can show text like below The Observabality plantform will be on…

devops grafana sre

asked Jun 21 '23 at 04:43

SHC

487
1
6
19

Questions tagged [sre]