7

I have created a Lambda function which retrieves some data from DynamoDB, and it'll output some JSON. What I'm trying to do is run this function in lambda@edge and generate a response which I can cache using Cloudfront.

The problem I'm facing is that my data in DynamoDB is replicated in (currently) two regions (us-east-2 and eu-west-1) using Global Tables and lambda@edge obviously runs in many regions.

This stops me being able to use AWS_REGION from the lambda environment. For example if a request ran in us-west-1, the environment variable would reflect that, and it'd try to retrieve the data from us-west-1 where it should actually go to us-east-2.

Whilst admittedly I've not tried this (yet) I wondered if perhaps I can set my own latency based routing up in Route53 to point say ddb.mydomain.com at the endpoints for DynamoDB in the regions I use, assuming SAN certs are set up it'd work?

I thought perhaps I could map regions in the code per the example below

const process = { env: { AWS_REGION: 'us-east-1' } };

const regions = {
  'eu-west-1': ['eu-west-1', 'eu-central-1', '...'],
  'us-east-2': ['us-west-1', 'us-east-1', '...'],
};

const activeRegions = Object.keys(regions);

const region = activeRegions.find(
  key => regions[key].includes(process.env.AWS_REGION)
) || activeRegions[0];

console.log(region) // us-east-2

This feels like it'd be more maintenance than it's worth and relies on me making assumptions about the best region to pick. I'd also have to keep my list of regions up to date.

I could use just the first two letters of the region to limit the need to update it when new data centres open up slightly but it's still not ideal

const process = { env: { AWS_REGION: 'ca-central-1' } };

const regions = {
  'eu-west-1': ['eu', 'sa', 'ap', '...'],
  'us-east-2': ['us', 'ca', 'sa', '...'],
};

const activeRegions = Object.keys(regions);

const key = activeRegions.find(
  key => regions[key].includes(
    process.env.AWS_REGION.substring(0, 2) // Just the first 2 letters
  )
) || activeRegions[0];

console.log(key); // us-east-2

I suspect I'm missing something obvious which might allow me to sensibly pick a region in which my data exists from lambda@edge.

Edit

I've since found this, an aws lambda@edge workshop which has been removed which suggests a similar approach to the above. why it was removed i don't know.

function updateDynamoDbClientRegion(request) {  
    let region; 

     // Check if viewer country header is available 
    if (request.headers['cloudfront-viewer-country']) { 
        const countryCode = request.headers['cloudfront-viewer-country'][0].value;  
        region = countryToRegionMapping[countryCode];   
    }   

     // Update DynamoDB client with nearer region   
    if (region) {   
        ddb = ddbUS;    
    }   
}

The readme for said workshop now simply discusses the option of using global tables to reduce latency but offers no insights as to how to pick the closest one which has data.

Edit 2

I've grabbed a copy of the latency data from cloudping and pieced together the following gist which works for now.

https://gist.github.com/benswinburne/06a00fab330dca93ea6df2552f73850a

The downside of this is obviously that the data is stale. cloudping's api isn't nearly quick enough for this purpose unfortunately and as soon as I go to a remote resource to grab up to date data I may as well have just gone to a DynamoDB table in any region ¯\_(ツ)_/¯

halfer
  • 19,824
  • 17
  • 99
  • 186
Ben Swinburne
  • 25,669
  • 10
  • 69
  • 108

2 Answers2

2

Sorry this is really old but in case anyone needs a solution. Let's say you have 3 replica regions: us-west-2, us-east-1, and eu-west-2.

In your CDK stack (or manually create it in the console):

REGIONS.map(region => {
  new CfnRecordSet(stack, `Latency ${region}`, {
    setIdentifier: `lbr_${region}`,
    name: `lbr.example.com`,
    type: 'TXT',
    hostedZoneId: '<Zone ID>',
    region,
    ttl: '31540000', // 1 year in seconds
    resourceRecords: [`"${region}"`]
  })
})

This will create 3 TXT Latency records within each region, with the value of the region.

In your server code:

import dns from 'dns/promises'

const res = await dns.resolveTxt('lbr.example.com')
// TODO: handle errors, eg catch and default to us-east-1
const lowestLatencyRegion = res[0][0]
// connect to dynamo
new DynamoDBClient({region: lowestLatencyRegion})

You can verify this by deploying and then hitting the server from a proxy sevice in a different region. Log the "lowestLatencyRegion" to see that it is indeed near one of your replica regions.

Edit: You may want to run a precheck of process.env.AWS_REGION to see if it's in one of your replica regions to skip the TXT lookup. This will save you a few ms in the lookup.

khuezy
  • 157
  • 1
  • 9
0

Regarding your last comment about Global Tables; there is currently no way to reconfigure a table from a particular region to a global table. There are currently two options, depending on whether your tables are replicated (i.e. contain the same data or not). If they contain the same data:

  1. Backup the table using DynamoDB backup
  2. Create a new global table
  3. Restore the table dump into the new global table

If the tables are not replicated, the process would be slightly different:

  1. Export the data from the tables using Data Pipeline
  2. Create a new global table
  3. Import the dumps into the global table using data pipeline

Note that Data Pipeline does not support the new on-demand DynamoDB provisioning. If you were going down this route, you would need to reconfigure the tables to use the old style provisioning whilst you do the export.

I hope this helps. I think your question, by the end, was about moving to a global table, at which point your lambda@edge will just use the nearest table. But I'm not sure whether that's what you needed help with?

EDIT: Just had a look and I now realise this doesn't really solve your problem. You still need to specify a region even with global tables (i.e. which region to read from, even though the data will be auto-replicated). So your question is still, which region to use for the read/write?

EDIT: Just to confirm, are you worried about hitting the wrong DB and missing your data, or getting the closest DB to reduce latency? If the former, the global tables things will work fine for you, as the data will be automatically replicated across regions when you write it to the local DB.

F_SO_K
  • 13,640
  • 5
  • 54
  • 83
  • 1
    I have global tables already so that's not an issue. I need to know which region to pick. I'm not using every region with my global tables so if i need to pick a region 1. which the data exists, and 2. with the lowest latency to whichever edge location the function ran in. – Ben Swinburne Apr 05 '19 at 08:57