0

I am using Serverless to deploy a basic web scraping API. Having tested the scraping code before switching to the serverless setup, I found that the scraping code could take upwards of 60 seconds to complete based on the particular URLs it was scraping and the data it was retrieving.

I have setup the correct IAM permissions in my serverless.yml:

iamRoleStatements:
  - Effect: Allow
    Action:
      - lambda:InvokeFunction
    Resource: "*"

I have also setup my two functions:

functions:
  scrape:
    handler: handler.scrape
    memorySize: 1536MB
  
  triggerScrape:
    handler: handler.triggerScrape
    events:
      - http:
          path: /scrape
          method: get

And in my handler.js:

module.exports.triggerScrape = async (event, context) => {
  try {
    const invoke = lambda.invoke({
      FunctionName: 'my-api-v2-dev-scrape',
      InvocationType: 'Event',
      Payload: JSON.stringify({
        link: event.queryStringParameters['link'],
        batchId: event.queryStringParameters['batchId'],
      })
    })

    return {
      statusCode: 202,
      headers: {
        'Access-Control-Allow-Origin': '*'
      },
      body: JSON.stringify({ message: 'Scrape request recieved' })
    }
  } catch (err) {
    console.log(`Invoke error: ${err}`)
  }
}

module.exports.scrape = async (event) => {
  // Lengthy Puppeteer scrape code that gets data and saves it to a database
  // It does not need to return as part of the API call, it just needs to be triggered once
  // and the user will refresh the page later to see the results
}

When I GET the /triggerScrape endpoint I get 202: "Scrape request recieved" but as far as I can see the scrape function is never run. When I run serverless logs -f scrape nothing returns.

Does anyone know how I can check if the function was actually run? Is it something to do with the async nature of the request? Thanks in advance for any advice you can give.

Ollie
  • 1,355
  • 1
  • 10
  • 22

2 Answers2

1

Your handler is returning before the API request is sent. You need to use await, as well as .promise(), with your call to lambda.invoke (it returns a promise):

const invoke = await lambda.invoke({
  FunctionName: 'my-api-v2-dev-scrape',
  InvocationType: 'Event',
  Payload: JSON.stringify({
    link: event.queryStringParameters['link'],
    batchId: event.queryStringParameters['batchId'],
  })
}).promise();

Alternatively you could pass a callback function to invoke.

When debugging these kinds of issues it can be useful to add console.log calls before and after the work you expect to be done, and log what it is returning.

andrhamm
  • 3,924
  • 4
  • 33
  • 46
  • Thanks so much for your response. I've added in your suggested code and it seems like this setup causes the function to actually wait for the return of the invocation as it times out after 6 seconds. As for the `scrape` function, I have set its individual timeout to 5 minutes to make sure it can perform the full scrape in good time. Is there any edit to your example that would cause the invoke to happen but then return immediately after? Thanks again! – Ollie Sep 02 '20 at 13:47
  • You are passing InvocationType=Event which will make the API call to trigger the lambda and return immediately (before the actual lambda is finished). Is it possible your triggering lambda doesn't have permission to invoke the scrape lambda? Consult the docs for `invoke`: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Lambda.html#invoke-property – andrhamm Sep 02 '20 at 14:26
  • Thanks for your help @andrhamm - I ended up using your promise suggestion in a slightly different way (see my answer below). For some reason, wrapping the invoke call in a promise itself and awaiting that saw that the Lambda function got successfully triggered before returning. Although I am unsure why wrapping it in a promise itself was any different from your suggestion of `await lambda.invoke().promise()`! But at least it works now! – Ollie Sep 02 '20 at 14:32
0

After a lot of painfull trial and error and SO searching, I finally came across this answer to a thread: https://stackoverflow.com/a/54126705/3011431

It seems to be the only thing that worked for me, although given the number of topics surrounding this it seems like all manner of things can fix the implementation for any given situation.

I ended up wrapping my Lambda invoke call in a Promise:

module.exports.triggerScrape = async (event) => {
  try {
    await new Promise((resolve, reject) => {
      lambda.invoke({
        FunctionName: 'my-api-v2-dev-scrape',
        InvocationType: 'Event',
        Payload: JSON.stringify({
          link: event.queryStringParameters['link'],
          batchId: event.queryStringParameters['batchId'],
        })
      }, (err, data) => {
        if (err) {
          console.log(err, err.stack)
          reject(err)
        }
        else {
          resolve(data)
        }
      })
    })

    return {
      statusCode: 202,
      headers: {
        'Access-Control-Allow-Origin': '*'
      },
      body: JSON.stringify({ message: 'Audit request recieved' })
    }
  } catch (err) {
    console.log(`Invoke error: ${err}`)
  }
}

Finally I got the 202 response and also saw that the scrape had been triggered as I could now see logs on the scrape function and also the data from the scrape had been successfully pushed to my database.

Ollie
  • 1,355
  • 1
  • 10
  • 22