Spring Boot Cloud + Ribbon + Feign + Hystrix + Zookeeper: what's going on retries and failures?

Question

I am trying to create few services with spring-boot (1.5.1) using Ribbon + Feign + Hystrix (and my service discovery is spring-boot-zookeeper) and I don't use Zuul.

I was (naive) thinking it should work in following way:

Calling Feign method (annotated by @FeignClient) - it converts it to some HTTP request which is load balanced in some way by Ribbon, so if sending request fails, it tries (according to ribbon config, i.e. myservice.ribbon.MaxAutoRetriesNextServer=2) to retry on next service of same type/name and finally if all retries fail - it calls Hystrix fallback method.

So my Feign interface

@FeignClient(value = "myservice", fallbackFactory = HystrixMyServiceFallbackFactory.class)
@RibbonClient(name = "myservice")
public interface MyServiceClient {
    @RequestMapping(value = "/foo", method = RequestMethod.POST)
    Response foo(Object data);
}

Defined Hystrix FallbackFactory to return some default response

public class HystrixMyServiceFallbackFactory implements FallbackFactory<MyServiceClient > {

    @Override
    public MyServiceClient create(final Throwable throwable) {
        return new MyServiceClient () {

            @Override
            public Response foo(Object data) {
                return new Response(-1, "Failed");
            }
        };
    }
}

Somewhere in my code I have following lines:

@Autowired
private MyServiceClient myServiceClient;

public Response doSomething() {
   return myServiceClient.foo(new Object());
}

When all services is up (I have 2 of MyService), Ribbon works fine with nice Round Robbin, but when I shut down one of MyService instances, Ribbon continue with Round Robbin, so every second attempt, I receive result of Hystrix Fallback, instead of expected success (ribbon should retry on other service, shouldn't it?), until ribbon server list is updated.

Anybody could explain how it works all this together?

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

1

This is probably what you're looking for: https://stackoverflow.com/a/29171396/873590

If you are using ribbon you can set properties similar to the following (substituting "localapp" for your serviceid):
localapp.ribbon.MaxAutoRetries=5
localapp.ribbon.MaxAutoRetriesNextServer=5
localapp.ribbon.OkToRetryOnAllOperations=true

edited Jun 20 '20 at 09:12

Community

1
1

answered May 26 '17 at 04:04

dsep

519
4
10

score 0 · Answer 2 · answered May 26 '17 at 08:05

There are several things that you need to check.

First, check your maven/gradle file has spring-retry dependency. Feign + Ribbon retry requires spring-retry and it is optional dependency now. So if you don't have spring-retry on your application, retry will not be supported.

If retry didn't occur after applying spring-retry, you need to check your original exception messsage. To do that, remove fallback and check the messsage. If message has myservice timed-out, it means hystrix timeout exception occurred.

Hystrix timeout is 1,000ms as default and sometimes it not enough to retry http requests if you have big read/connect timeout. If so, try to adjust hystrix timeout like below.

hystrix:
  command:
    default:
      execution:
        isolation:
          thread:
            timeoutInMilliseconds: 10000

Above property will change your all default timeout of hystrix to 10 seconds and it's usually big enough. You need to set this value with the right value to have enough time for the retries.

You can also change hystrix timeout for your specific circuit breaker. In case of Feign, every method will have its own circuit breaker and it's name is like below in your case.

MyServiceClient#foo(Object)

So you can change your circuit breaker's timeout like below.

hystrix:
  command:
    MyServiceClient#foo(Object):
      execution:
        isolation:
          thread:
            timeoutInMilliseconds: 10000

Thx, I found this configuration. But in my opinion it could be little bit problematic in microservice environment. If I have 5 of "service" and 2 of them are slow, it won't solve me the problem: or I'll disable all "service" components (via Hystrix), or disable hystrix. After little bit thinking, I decided to solve it different: to improve health indicator, so "bad service" will be marked as not healthy fast and it will be removed by service discovery from the list. I still have "problem" with slow service, but maybe I should to define what is slow and again to mark it as a "bad" service. — Mikhail Grinfeld, May 27 '17 at 10:34

Spring Boot Cloud + Ribbon + Feign + Hystrix + Zookeeper: what's going on retries and failures?

2 Answers2