1

I have this service made in go that exposes RESTful API. I have a middleware that keeps track of requests via prometheus metrics (countervec and histogram) -> requests_total and request_duration. Every time a request comes in, the middleware calls 2 functions that pump the number up and log the duration of the request.

It has been happening sporadically. Sporadically as in when the service just springs up, it runs for a bit, throws the error, retries and then eventually crashes and stop producing metrics. Please check the error below

I want to know if anyone has encountered this error before and knows how to fix or debug the issue. I am speculating it is a race condition problem but I am not sure how to debug it with the knowledge and tools i have. Like: How do i replicate this issue via unit tests?

4 error(s) occurred:
* collected metric "omittedPart_request_total_6687cdb68f_2wqht" { label:<name:"method" value:"POS" > label:<name:"path" value:"/api/someService/user/:user_id" > label:<name:"servicename" value:"some-service" > label:<name:"status_code" value:"204" > counter:<value:1 > } was collected before with the same name and label values
* collected metric "omittedPart_request_total_6687cdb68f_2wqht" { label:<name:"method" value:"POST" > label:<name:"path" value:"/api/someService" > label:<name:"servicename" value:"some-service" > label:<name:"status_code" value:"200" > counter:<value:39 > } was collected before with the same name and label values
* collected metric "omittedPart_request_total_6687cdb68f_2wqht" { label:<name:"method" value:"GETT" > label:<name:"path" value:"/api/someService" > label:<name:"servicename" value:"some-service" > label:<name:"status_code" value:"200" > counter:<value:1 > } was collected before with the same name and label values
* collected metric "omittedPart_request_duration_seconds_6687cdb68f_2wqht" { label:<name:"method" value:"POST" > label:<name:"path" value:"/api/ujt" > label:<name:"servicename" value:"some-service" > label:<name:"status_code" value:"200" > histogram:<sample_count:39 sample_sum:0.38108896699999995 bucket:<cumulative_count:0 upper_bound:1e-09 > bucket:<cumulative_count:0 upper_bound:2e-09 > bucket:<cumulative_count:0 upper_bound:5e-09 > bucket:<cumulative_count:0 upper_bound:1e-08 > bucket:<cumulative_count:0 upper_bound:2e-08 > bucket:<cumulative_count:0 upper_bound:5e-08 > bucket:<cumulative_count:0 upper_bound:1e-07 > bucket:<cumulative_count:0 upper_bound:2e-07 > bucket:<cumulative_count:0 upper_bound:5e-07 > bucket:<cumulative_count:0 upper_bound:1e-06 > bucket:<cumulative_count:0 upper_bound:2e-06 > bucket:<cumulative_count:0 upper_bound:5e-06 > bucket:<cumulative_count:0 upper_bound:1e-05 > bucket:<cumulative_count:0 upper_bound:2e-05 > bucket:<cumulative_count:0 upper_bound:5e-05 > bucket:<cumulative_count:0 upper_bound:0.0001 > bucket:<cumulative_count:0 upper_bound:0.0002 > bucket:<cumulative_count:0 upper_bound:0.0005 > bucket:<cumulative_count:0 upper_bound:0.001 > bucket:<cumulative_count:0 upper_bound:0.002 > bucket:<cumulative_count:0 upper_bound:0.005 > bucket:<cumulative_count:33 upper_bound:0.01 > bucket:<cumulative_count:38 upper_bound:0.02 > bucket:<cumulative_count:38 upper_bound:0.05 > bucket:<cumulative_count:39 upper_bound:0.1 > bucket:<cumulative_count:39 upper_bound:0.2 > bucket:<cumulative_count:39 upper_bound:0.5 > bucket:<cumulative_count:39 upper_bound:1 > bucket:<cumulative_count:39 upper_bound:2 > bucket:<cumulative_count:39 upper_bound:5 > bucket:<cumulative_count:39 upper_bound:10 > bucket:<cumulative_count:39 upper_bound:15 > bucket:<cumulative_count:39 upper_bound:20 > bucket:<cumulative_count:39 upper_bound:30 > > } was collected before with the same name and label values

More Context

  • The services are running in a kubernetes cluster as pods (via deployment)
  • The metrics i am exposing are unique to each pod -> I am appending a GUID on the metric
  • My implementation is something similar to the basic implementation of it here
  • Using fiber.go to create our APIs and calling the functions to pump the metrics up via custom http handler as middleware
  • I am using client_golang v1.15.0 as our metrics library

Any help would be appreciated and Thank you!

  • I tried adding a lock (via mutex) on the functions calling the metric but the error is still happening
rolldawg
  • 11
  • 2
  • [Divide and conquer in debugging](https://betterprogramming.pub/find-and-fix-bugs-like-a-pro-with-divide-and-conquer-d55f3cf91154). Check if the same happening: if metrics collection disabled, if duration logging disabled, if only one of metrics is collected, an so on. – markalex Apr 27 '23 at 09:58

0 Answers0