5

Initially I thought that multiple services per ALB listener with different path patterns to distribute API calls appropriately was the obvious choice. In terms of health checks though (if one of those services goes down), I don't know of a smart way to divert traffic for just that service to a different region.

If I have an active active setup with weighted route 53 records that will failover on a health check, I don't see any other solution than to either cut off that entire ALBs traffic and divert to another region, or ignore the 1 down service and continue to send traffic to the partially failing ALB.

Having a one to one mapping of ALBs to services fixes this solution, but it adds additional overhead in terms of cost and complexity.

What is the recommended pattern to follow for an active active microservices architecture?

user2254140
  • 71
  • 2
  • 7
  • We wrestled with this for a while when we decided to migrate our services to ALB path-based routing. For active-active we run multiple ECS clusters behind the ALB. Support services like OAuth2.0 reside in one cluster with multiple tasks spread across ec2 instances. Another cluster handles the majority of the light services, again multiple tasks per service spread over at least 2 ec2's at a time. for failover to another region, we use a warm site for now. If an event is declared, we cut DNS at that time. What is your requirements for uptime and RTB in the event of failure? – Crutches Jan 31 '18 at 01:59
  • When you say you cut DNS, are you saying that you completely switch traffic from that ALB to an ALB in another region? My scenario involves around 10 services serving a good deal of traffic, and completely cutting over all service traffic to another "warm" region is really something that i would like to avoid. requirements for uptime should be as close to 100% as i can get. – user2254140 Jan 31 '18 at 03:37
  • 1
    per AWS Support: "From my tests I can see that it is not possible for R53 to fail the traffic on a per service basis for the services associated with a ALB listener. You can only implement a failover for the entire ALB which will consequently lead to failing over of all the services associated with an ALB." – user2254140 Jan 31 '18 at 03:38
  • Yep, exactly, when an event occurs, we failover to the warm site. This is a business continuity requirement from our industry regulators. 100% uptime is always the goal, but you're at the mercy of your cloud provider. Our warm site is there for events like when S3 and lambda went down last year. Within the main region, we take advantage of redundant tasks running on separate machines, preferably in different az's. We use multiple smaller clusters, running like-traffic services. We've found this to be most cost effective. Sorry I can't be of more help. – Crutches Jan 31 '18 at 03:51

1 Answers1

0

If all of the services are accessed under a single hostname then the DNS of course must point to exactly one place, so rerouting is fundamentally an all-or-nothing prospect.

However, there's an effective workaround.

Configure a "secret" hostname for each service. ("Secret" in the sense that the client does not need to be aware of it.) We'll call these "service endpoints." The purpose of these hostnames is for routing requests to each service... svc1.api.example.com, svc2.api.example.com, etc.

Configure each of these DNS records to point to the primary or failover load balancer, with Route 53 entries and a Route 53 health check that specifically checks that one service for health at each balancer.

What you have at this point is a hostname for each service that will have a DNS answer that correctly points to the preferred, healthy endpoint.

What you don't yet have is a way to ensure that client requests go to the right place.

For this, create a CloudFront distribution, with your public API hostname as an Alternate Domain Name. Define one CloudFront Origin for each of these service endpoints (leave "Origin Path" blank), then create a Cache Behavior for each service with the appropriate path pattern e.g. /api/svc1* and select the matching origin. Whitelist any HTTP headers that your API needs to see.

Finally, point DNS for your main hostname to CloudFront.

The clients will automatically connect to their nearest CloudFront edge location, and CloudFront -- after matching the path pattern to discover where to send the request -- will check the DNS for that service-specific endpoint and forward the request to the appropriate balancer.

CloudFront, in this application is not a "CDN" per se, but rather a globally-distributed reverse proxy -- logically, a single destination for all your traffic, so no failover configuration is required on the main hostname for the API... so no more all-or-nothing routing. On the back-side of CloudFront, those service endpoint hostnames ensure that requests are routed to a healthy destination based on the Route 53 health checks. CloudFront respects the TTL of these DNS records and will not cache DNS responses that it shouldn't.

Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427