4

I've been working on testing a large sitemap, same as this question here.

The answer worked well up to a point, then started getting status code 429 which is because the web site thinks I'm carrying out a Denial of Service attack.

How do overcome that issue?

cy.request('sitemap.xml')
  .then(response => {
    return Cypress.$(response.body).find('loc')
      .toArray().map(el => el.innerText)
  })
  .then(urls => {
    urls.forEach(url => {
      cy.request({url, failOnStatusCode: false})
        .then(response => results.push({url, status: response.status}))
      })
    })
  })
Roberta D
  • 203
  • 9
  • Does this answer your question? [How to rate-limit ajax requests?](https://stackoverflow.com/questions/5031501/how-to-rate-limit-ajax-requests) – bogdanoff Sep 17 '22 at 03:49
  • 1
    @bogdanoff - this is running in the Cypress framework using their specific fetch command `cy.request()` which provides the `failOnStatusCode: false` setting to properly test if the link is valid. Cypress has lodash included, but `cy.request()` can't be wrapped the way your linked question does it (commands run on a queue). – Roberta D Sep 17 '22 at 04:06
  • I also looked at this [Extract sitemap URLs and cy.request() each URL per a unique test](https://stackoverflow.com/questions/67548643/extract-sitemap-urls-and-cy-request-each-url-per-a-unique-test-cypress) but still get the status codes 429 part way through the list. – Roberta D Sep 17 '22 at 04:13

1 Answers1

3

To rate-limit the requests, just add a cy.wait() in the loop.

The wait time depends on the maximum request rate, for example StackOverflow has a limit of 30 requests per second so I'd use a wait of 50ms.

There's a couple of other optimizations you could make to the code

  • use HEAD method instead of GET

  • apply {log:false} option to the commands, since Cypress logging involves updating the runner UI (console.log instead)

  • only store the bad urls, reduces the memory footprint

const results = [];

cy.request('sitemap.xml')
  .its('body')
  .then(xml => {
    return [...Cypress.$(xml).find('loc')].map(el => el.innerText)
  })
  .then(urls => {
    urls.forEach(url => {
      cy.request({
        method:'HEAD', 
        url, 
        failOnStatusCode:false, 
        log:false                                           
      })  
      .then(response => {
        console.log(url)                                       // progress
        if (response.status !== 200) {
          results.push({url, status: response.status}))        // fails only
        }
      })
      cy.wait(50,{log:false})                                  // throttle
    })
  })

cy.then(() => {
  cy.log('Bad URLs', results)
})
Fody
  • 23,754
  • 3
  • 20
  • 37