I am using Serverless to deploy a basic web scraping API. Having tested the scraping code before switching to the serverless setup, I found that the scraping code could take upwards of 60 seconds to complete based on the particular URLs it was scraping and the data it was retrieving.
I have setup the correct IAM permissions in my serverless.yml
:
iamRoleStatements:
- Effect: Allow
Action:
- lambda:InvokeFunction
Resource: "*"
I have also setup my two functions:
functions:
scrape:
handler: handler.scrape
memorySize: 1536MB
triggerScrape:
handler: handler.triggerScrape
events:
- http:
path: /scrape
method: get
And in my handler.js
:
module.exports.triggerScrape = async (event, context) => {
try {
const invoke = lambda.invoke({
FunctionName: 'my-api-v2-dev-scrape',
InvocationType: 'Event',
Payload: JSON.stringify({
link: event.queryStringParameters['link'],
batchId: event.queryStringParameters['batchId'],
})
})
return {
statusCode: 202,
headers: {
'Access-Control-Allow-Origin': '*'
},
body: JSON.stringify({ message: 'Scrape request recieved' })
}
} catch (err) {
console.log(`Invoke error: ${err}`)
}
}
module.exports.scrape = async (event) => {
// Lengthy Puppeteer scrape code that gets data and saves it to a database
// It does not need to return as part of the API call, it just needs to be triggered once
// and the user will refresh the page later to see the results
}
When I GET the /triggerScrape
endpoint I get 202: "Scrape request recieved"
but as far as I can see the scrape
function is never run. When I run serverless logs -f scrape
nothing returns.
Does anyone know how I can check if the function was actually run? Is it something to do with the async nature of the request? Thanks in advance for any advice you can give.