Istio stop loadbalancing when add configs for outlier detections

Question

Team ,

I’ve been playing around with isito1.7 and outlier detections, here are some weird things I found vs-dr.yaml

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: recommendation
spec:
  hosts:
    - "recommendation-demo.com"
  gateways:
    - istio-system/monitoring-gateway
  http:
  - name: "other-account-route"
    route:
    - destination:
        host: recommendation
        subset: v2
      weight: 100
    - destination:
        host: recommendation
        subset: v1
      weight: 0
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: recomm-dr
spec:
  host: recommendation
  subsets:
  - name: v2
    labels:
      version: v2
    trafficPolicy:
      loadBalancer:
        simple: ROUND_ROBIN
      connectionPool:
        tcp: {}
        http: {}
      outlierDetection:
        consecutiveErrors: 2
        interval: 1s
        baseEjectionTime: 30s
        maxEjectionPercent: 10
  - name: v1
    labels:
      version: v1

so If outlier detection is not configured in destination rules , the loadbalacing is working successfully like

kubectl -n micro exec -it $CLIENT_POD -c istio-proxy – sh -c ‘while true; do curl -L recommendation-demo.com; sleep 1; done’
recommendation v2 from ‘recommendation-v2-57ddf9cd95-wb7rj’: 45
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 851
recommendation v2 from ‘recommendation-v2-57ddf9cd95-jtkrz’: 44
recommendation v2 from ‘recommendation-v2-57ddf9cd95-wb7rj’: 46
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 852
recommendation v2 from ‘recommendation-v2-57ddf9cd95-jtkrz’: 45
recommendation v2 from ‘recommendation-v2-57ddf9cd95-wb7rj’: 47
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 853
recommendation v2 from ‘recommendation-v2-57ddf9cd95-jtkrz’: 46
recommendation v2 from ‘recommendation-v2-57ddf9cd95-wb7rj’: 48
recommendation v2 from ‘recommendation-v2-57ddf9cd95-jtkrz’: 47
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 854

But after I add this part

outlierDetection:
consecutiveErrors: 2
interval: 1s
baseEjectionTime: 30s
maxEjectionPercent: 50

the only result I got is from

recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 1321
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 1322
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 1323
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 1324
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 1325
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 1326
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 1327

And BTW after I add the outlier config and then scale out the deployment , the youngest pod can be routed successfully

recommendation v2 from ‘recommendation-v2-57ddf9cd95-xhq4n’: 32
recommendation v2 from ‘recommendation-v2-57ddf9cd95-xhq4n’: 33
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 1364
recommendation v2 from ‘recommendation-v2-57ddf9cd95-xhq4n’: 34
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 1365
recommendation v2 from ‘recommendation-v2-57ddf9cd95-xhq4n’: 35
recommendation v2 from ‘recommendation-v2-57ddf9cd95-skkgd’: 1366
recommendation v2 from ‘recommendation-v2-57ddf9cd95-xhq4n’: 36

So my question is ,

Is this an expected behaviour ? In this case, lets say we have 3 pods in one rs ,and apply the ourlier configs then the request will only routed to the youngest pod recommendation-v2-57ddf9cd95-skkgd
We have rs and outlier configs in place , then we add extra pods to the rs , they can be loadbalanced successfully ?
Anyone has sucess configs for outliers ? Much appreciated for any replies!

In theory the basic intent of outlier detection is to stop sending requests to the unhealthy instance and give it time to recover. In the meantime, the requests are redirected to the healthy instances such that the consumers are not impacted. Did you change something in your apps so they would act as a unhealthy instances, for example like [here](https://youtu.be/OEo99GjUv6Q?t=261)? There is an [example](https://www.citrix.com/blogs/2020/07/15/outlier-detection-using-citrix-adc-in-istio-service-mesh/) with outlier detection. — Jakub, Nov 04 '20 at 14:03
Hi @Jakub , no all the pods are fresh and newly created for testing purpose. If I cross out outlier configs from the dr, they start loadbalacing . I think it's somethin to do with the istio default feature here: https://istio.io/latest/docs/ops/configuration/traffic-management/locality-load-balancing/ — Ray Gao, Nov 06 '20 at 07:47
Could you add your deployment and service for testing? I have tried with recommendation-v2 from this [github](https://github.com/redhat-scholars/istio-tutorial/tree/master/recommendation/kubernetes), I have configured it with your vs/dr and it works, even after I add this outlierDetection part. About the question number 2, I have added 2 new replicas when sending the traffic to 3 other replicas, and when they were ready they started getting the traffic too. — Jakub, Nov 10 '20 at 11:24
Hi @Jakub ,thanks for reply. I've been researching these days and I think this is a expected feature in istio called LocalityLoadBalancing check this out https://istio.io/latest/docs/ops/configuration/traffic-management/locality-load-balancing/. But I only see localitylb when traffic flow over istio ingress gw, when inside the mesh the traffic will route to different pod in different zone randomly — Ray Gao, Nov 10 '20 at 12:16
And meanwhile, do you have any successful config for outlier detections? — Ray Gao, Nov 10 '20 at 12:17
If you think it's locality load balancing then you can turn it off with `--set meshConfig.localityLbSetting.enabled=false` and check if that's it, but I think it's not related in this case. I haven't tested it yet on the recommendation deployment, but I wanted to use this [example](https://istio.io/latest/docs/tasks/traffic-management/circuit-breaking/) from istio documentation on it. — Jakub, Nov 10 '20 at 12:21
HI , the example only introduce cb for concurrent conns overflow. But not outlier dections , I want to see lets say I have 6 pods and I manually turn 2 of them misbehave by returning 503. They should be ejected from the pool and added back after 30s (according to my configs above). — Ray Gao, Nov 10 '20 at 12:55
So far if I manually turn 1 of the replicas to return 503 it's ejected from the pool, so the traffic goes only to other 4 healthy replicas, but i'm not sure if it's trying to add it to the pool again, I will check that and let you know what are the results. — Jakub, Nov 10 '20 at 13:20

score 0 · Answer 1 · answered Nov 12 '20 at 13:22

I have created below example with this video on youtube and this github repository.

It's based on 1 deployment with service and appropriate gateway, virtual service and destination rule.

Testing on gke with istio 1.7.4.

Example yamls.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: recommendation
    version: v2
  name: recommendation-v2
spec:
  replicas: 2
  selector:
    matchLabels:
      app: recommendation
      version: v2
  template:
    metadata:
      labels:
        app: recommendation
        version: v2
      annotations:
        sidecar.istio.io/inject: "true"
    spec:
      containers:
      - env:
        - name: JAVA_OPTIONS
          value: -Xms15m -Xmx15m -Xmn15m
        name: recommendation
        image: quay.io/rhdevelopers/istio-tutorial-recommendation:v2.2
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        - containerPort: 8778
          name: jolokia
          protocol: TCP
        - containerPort: 9779
          name: prometheus
          protocol: TCP
        resources:
          requests:
            memory: "80Mi"
            cpu: "200m" # 1/5 core
          limits:
            memory: "120Mi"
            cpu: "500m"
        livenessProbe:
          exec:
            command:
            - curl
            - localhost:8080/health/live
          initialDelaySeconds: 5
          periodSeconds: 4
          timeoutSeconds: 1
        readinessProbe:
          exec:
            command:
            - curl
            - localhost:8080/health/ready
          initialDelaySeconds: 6
          periodSeconds: 5
          timeoutSeconds: 1
        securityContext:
          privileged: false

---


apiVersion: v1
kind: Service
metadata:
  name: recommendation
  labels:
    app: recommendation
spec:
  ports:
  - name: http
    port: 8080
  selector:
    app: recommendation


---            

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: my-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
      - "*"

---

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: recommendation
spec:
  hosts:
    - "*"
  gateways:
    - "my-gateway"
  http:
  - name: "other-account-route"
    route:
    - destination:
        host: recommendation
        subset: v2
      weight: 100


---

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: recomm-dr
spec:
  host: recommendation
  subsets:
  - name: v2
    labels:
      version: v2
    trafficPolicy:
      loadBalancer:
        simple: ROUND_ROBIN
      outlierDetection:
        consecutiveErrors: 1
        interval: 1s
        baseEjectionTime: 60s
        maxEjectionPercent: 100

1.Is this an expected behaviour ? In this case, lets say we have 3 pods in one rs ,and apply the ourlier configs then the request will only routed to the youngest pod recommendation-v2-57ddf9cd95-skkgd

No, after you apply this outlierDetection it should work as before, unless they return 503.

2.We have rs and outlier configs in place , then we add extra pods to the rs , they can be loadbalanced successfully ?

Yes, they should be loadbalanced successfully.

There is test from above yamls.

Added below outlierDetection

  outlierDetection:
    consecutiveErrors: 1
    interval: 10s
    baseEjectionTime: 90s
    maxEjectionPercent: 100

With 2 replicas and outlierDetection.

recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 1
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 1
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 2
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 2
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 3
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 3
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 4
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 4
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 5
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 5
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 6

With 2 replicas, outlierDetection and added next 2 replicas with kubectl scale deployment recommendation-v2 --replicas=4

recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 15
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 17
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 16
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 18
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 17
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 19
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 18
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 20
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 19
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 20
recommendation v2 from 'recommendation-v2-7f76b4c8cc-ml9m7': 1
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 21
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 21
recommendation v2 from 'recommendation-v2-7f76b4c8cc-ml9m7': 2
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 22
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 22
recommendation v2 from 'recommendation-v2-7f76b4c8cc-ml9m7': 3
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 23
recommendation v2 from 'recommendation-v2-7f76b4c8cc-kvqjk': 1
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 23
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 24
recommendation v2 from 'recommendation-v2-7f76b4c8cc-ml9m7': 4
recommendation v2 from 'recommendation-v2-7f76b4c8cc-ml9m7': 5
recommendation v2 from 'recommendation-v2-7f76b4c8cc-kvqjk': 2
recommendation v2 from 'recommendation-v2-7f76b4c8cc-kvqjk': 3
recommendation v2 from 'recommendation-v2-7f76b4c8cc-6tvmj': 24
recommendation v2 from 'recommendation-v2-7f76b4c8cc-htz56': 25

There are 2 new replicas added, ml9m7 and kvqjk.

Anyone has sucess configs for outliers ? Much appreciated for any replies!

If I understand correctly how it should work then above example works correct, if you manually change 1 of the pods to return 503 it's ejected from the pool and added back after 90s

There is a way from above video how to make the recommendation replica return 503.

kubectl exec -ti recommendation-v2-7f76b4c8cc-6tvmj -c recommendation /bin/bash
bash-4.4# curl localhost:8080/misbehave
Following requests to / will return a 503

And if you start sending traffic you can check logs of the deployment replica which should return 503 with

kubectl logs recommendation-v2-7f76b4c8cc-6tvmj -c recommendation --tail 10

There are a few requests every 90s, after istio detect 503 it will be ejected with outlierDetection. After 90s istio will try to send the traffic again and again.

Additional resources:

score 0 · Answer 2 · answered Apr 26 '21 at 02:21

So this issue seems to be related with the locality load balancer

When outlierDetection is not defined, the locality failover is disable -> so locality is not used. That is why the loadbalancer is working correctly.

But after setting outlierDetection the locality failover is enabled by default -> and so the request will be loadbalance on a locality

-> if you want to be sure:

    loadBalancer:
      simple: RANDOM
      localityLbSetting:
        enabled: false

Istio stop loadbalancing when add configs for outlier detections

2 Answers2

Linked