5

I want to use my already existing Prometheus and Grafana instances in the monitoring namespace to emulate what seldon-core-analytics is doing. I'm using the prometheus community helm charts and installed kube-prometheus-stack on k8s. Here's what I've done so far:

In the values.yaml file, under the prometheus config, I added the following annotations:

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/path: "/prometheus

Next, I looked at the prometheus-config.yaml in their Github repo and copied and pasted the configuration in a configmap file.

Also, created a ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: seldon-servicemonitor-default
  labels:
    seldon-monitor: seldon-default
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/managed-by: seldon-core
  endpoints:
    - interval: 15s
      path: /metrics
      port: http
    - interval: 15s
      path: /prometheus
      port: http
  namespaceSelector:
    matchNames:
      - seldon
      - default
      - monitoring

No errors with the above steps so far, but it doesn't appear as though the prometheus instance is able to scrape the metrics from a model I deployed on a different namespace. What other configuration do I need to do so that my own Prometheus and Grafana instances can gather and visualize the metrics from my seldon deployed models? The documentation doesn't really explain how to do this on your own instances, and the one they provide to you through seldon-core-analytics isn't production-ready.

Riley Hun
  • 2,541
  • 5
  • 31
  • 77

2 Answers2

2

Prometheus configuration in seldon-core-analytics is quite standard. It is based on built-in Kubernetes service discovery and it uses annotations to find scraping targets:

annotations:
  prometheus.io/scrape: true
  prometheus.io/path: /metrics
  prometheus.io/scheme: http
  prometheus.io/port: 9100

In their example configuration prometheus will target pods, services, and endpoints with prometheus.io/scrape: true annotation on them. The other three labels are used to override default scraping parameters per target. Thus if you have a config as in the example, you only need to put some of these annotations on pods.

The way kube-prometheus-stack works is different. It uses prometheus operator and CRDs to shape the configuration. This design document describes purpose of each CRD.

You need to create a ServiceMonitor resource in order to define a scraping rule for new services. ServiceMonitor itself should have labels as defined in prometheus resource (another CRD) under serviceMonitorSelector key. It is hard to provide you with a working example in these circumstances but this short guide should be enough to understand what to do.

I suggest you describe one of the ServiceMonitors that you have, then create a new one changing labels under matchLabels. Do not change the namespace in a new object, prometheus operator does not look for ServiceMonitors in other namespaces by default. To make ServiceMonitor discover targets in all namespaces the namespaceSelector has to be empty:

spec:
  namespaceSelector:
    any: true
anemyte
  • 17,618
  • 1
  • 24
  • 45
  • Thanks @anemyte. The references you provided I've look through already. I tried with a service monitor but still wasn't scraping the seldon deployments as targets. I have edited my original comment with the service monitor file. – Riley Hun Feb 02 '21 at 20:09
  • @RileyHun Seems legit. What have you checked so far? Prometheus configuration/targets in the web UI? is there anything related to seldon?. Prometheus operator logs may have some valuable information if there is a problem with CRD. Finally, some CRDs (not sure about these in particular) may show some hints in their status (available with kubectl describe). – anemyte Feb 02 '21 at 20:26
  • @aneymyte - I've only checked the Prometheus UI and looked at the targets under status. Maybe it has something do with RBAC authorization? Looking through this article: https://medium.com/kubernetes-tutorials/simple-management-of-prometheus-monitoring-pipeline-with-the-prometheus-operator-b445da0e0d1a – Riley Hun Feb 02 '21 at 20:31
  • @RileyHun I see RBAC roles and bindings are defined in the helm chart but I can't tell what's in your cluster :) . I suggest you delete and create ServiceMonitor again. While doing so capture logs of both prometheus and the operator. There can be something about the CRD, maybe an error or a warning. Also, check prometheus configuration in the UI. It is available under 'status'. Can you see there something related to seldon or its ServiceMonitor? At last, just to be sure, do you have a Service object for your seldon model? – anemyte Feb 02 '21 at 20:52
  • Yes there is a Service object for the seldon model. RBAC roles are already defined? I missed that. I was going to create another one. Already deleted my ServiceMonitor. – Riley Hun Feb 02 '21 at 21:31
  • Now all my targets disappeared after I created a new ServiceMonitor. I was following the instructions from this thread: https://stackoverflow.com/questions/63606347/add-new-service-metrics-to-prometheus-operator – Riley Hun Feb 02 '21 at 21:59
  • I got Prometheus connecting to a generic app. Although, all the other targets disappeared. I am not sure what happened there. Maybe I replaced the default ServiceMonitor that was scraping the monitoring namespace? – Riley Hun Feb 02 '21 at 23:14
  • @RileyHun Its possible but there are many things that can go wrong. How many ServiceMonitors are there? `kubectl get servicemonitors -n monitoring` If there is only one, then it seems you did replace it. – anemyte Feb 03 '21 at 05:32
1

ServiceMonitors are extremely difficult to debug. My debugging strategy would be to:-

  1. Check if the ServiceMonitor created is being read by the Prometheus:- Look at the /targets URL. (There should be a target in 0/0 state at least) If not, that means the ServiceMonitor itself is not being picked up by Prometheus.I suggest looking into the following configuration in your kube-prometheus-stack configuration.

        serviceMonitorSelectorNilUsesHelmValues: false
        serviceMonitorSelector: {}
        serviceMonitorNamespaceSelector: {} 
    

    The default ServiceMonitor has the Helm metadata attached to it which is used by the Prometheus Operator to filter/choose the ServiceMonitors to monitor. Setting serviceMonitorSelectorNilUsesHelmValues:false will ignore any such selection.

  2. If the ServiceMonitor is visible in targets but there are no targets.:- In this case the issue lies between the ServiceMonitor and the pods it is trying to scrape.Check if the ports you mentioned are accessible and the pods fulfill the selectors mentioned.

My advice would be to start another dummy ServiceMonitor by following this and then modifying the ServiceMonitor one step at a time till it starts monitoring the seldon-core-analytics pods

rohatgisanat
  • 708
  • 6
  • 15