2

I built a scraper that collects data from a page, formats it and adds it to a database. It then uses the scraped data to build models, except for one value that it scrapes. Everything is wrapped in Celery so that tasks run in the background.

@router.post("/run/{id}")
async def create(id: str):
    wallet_reputation.delay(id)

    return {"Status": "Task successfully add to execute"}

Endpoint above works fine, everything is ok. The ID value that is added in the above endpoint is unique and there are about 100 such values. In order to automate building a model for each ID I made such an endpoint to call it from time to time (scrape data changes, hence I need to update my models).

@router.post("/run")
async def create_all():
    for address in all_addresses_generator():
        wallet_reputation.delay(address)

    return {"Status": "Tasks successfully add to execute"}

I recive that error

2022-03-26T15:25:52.051854+00:00 heroku[worker.1]: Process running mem=543M(104.1%)
2022-03-26T15:25:52.073256+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2022-03-26T15:26:02.875701+00:00 app[worker.1]: [2022-03-26 15:26:02,871: ERROR/ForkPoolWorker-8] Task walletReputation[2cca3c3e-8c58-4983-bbae-e55e52f33c1a] raised unexpected: TimeoutException('', None, ['#0 0x556bcd4bc7d3 <unknown>', '#1 0x556bcd218688 <unknown>', '#2 0x556bcd24ec21 <unknown>', '#3 0x556bcd24ede1 <unknown>', '#4 0x556bcd281d74 <unknown>', '#5 0x556bcd26c6dd <unknown>', '#6 0x556bcd27fa0c <unknown>', '#7 0x556bcd26c5a3 <unknown>', '#8 0x556bcd241ddc <unknown>', '#9 0x556bcd242de5 <unknown>', '#10 0x556bcd4ed49d <unknown>', '#11 0x556bcd50660c <unknown>', '#12 0x556bcd4ef205 <unknown>', '#13 0x556bcd506ee5 <unknown>', '#14 0x556bcd4e3070 <unknown>', '#15 0x556bcd522488 <unknown>', '#16 0x556bcd52260c <unknown>', '#17 0x556bcd53bc6d <unknown>', '#18 0x7f8e32957609 <unknown>', ''])
2022-03-26T15:26:02.875723+00:00 app[worker.1]: Traceback (most recent call last):
2022-03-26T15:26:02.875724+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
2022-03-26T15:26:02.875724+00:00 app[worker.1]:     R = retval = fun(*args, **kwargs)
2022-03-26T15:26:02.875724+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.9/site-packages/celery/app/trace.py", line 734, in __protected_call__
2022-03-26T15:26:02.875725+00:00 app[worker.1]:     return self.run(*args, **kwargs)
2022-03-26T15:26:02.875725+00:00 app[worker.1]:   File "/app/tasks.py", line 40, in wallet_reputation
2022-03-26T15:26:02.875725+00:00 app[worker.1]:     WalletReputation(id).add_reputation_to_db()
2022-03-26T15:26:02.875727+00:00 app[worker.1]:   File "/app/agents/walletReputation.py", line 261, in add_reputation_to_db
2022-03-26T15:26:02.875727+00:00 app[worker.1]:     nc_balance=self.nc_balance(),
2022-03-26T15:26:02.875727+00:00 app[worker.1]:   File "/app/agents/walletReputation.py", line 162, in nc_balance
2022-03-26T15:26:02.875727+00:00 app[worker.1]:     WebDriverWait(self.driver, 20)
2022-03-26T15:26:02.875727+00:00 app[worker.1]:   File "/app/.heroku/python/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 89, in until
2022-03-26T15:26:02.875728+00:00 app[worker.1]:     raise TimeoutException(message, screen, stacktrace)
2022-03-26T15:26:02.875728+00:00 app[worker.1]: selenium.common.exceptions.TimeoutException: Message: 
2022-03-26T15:26:02.875729+00:00 app[worker.1]: Stacktrace:
2022-03-26T15:26:02.875729+00:00 app[worker.1]: #0 0x556bcd4bc7d3 <unknown>
2022-03-26T15:26:02.875729+00:00 app[worker.1]: #1 0x556bcd218688 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #2 0x556bcd24ec21 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #3 0x556bcd24ede1 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #4 0x556bcd281d74 <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #5 0x556bcd26c6dd <unknown>
2022-03-26T15:26:02.875730+00:00 app[worker.1]: #6 0x556bcd27fa0c <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #7 0x556bcd26c5a3 <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #8 0x556bcd241ddc <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #9 0x556bcd242de5 <unknown>
2022-03-26T15:26:02.875731+00:00 app[worker.1]: #10 0x556bcd4ed49d <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #11 0x556bcd50660c <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #12 0x556bcd4ef205 <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #13 0x556bcd506ee5 <unknown>
2022-03-26T15:26:02.875732+00:00 app[worker.1]: #14 0x556bcd4e3070 <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #15 0x556bcd522488 <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #16 0x556bcd52260c <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #17 0x556bcd53bc6d <unknown>
2022-03-26T15:26:02.875733+00:00 app[worker.1]: #18 0x7f8e32957609 <unknown>

I don't understand why I suddenly get an error if the previous endpoint that performs the same task in Celery works normally. Below, I paste the code of the generator and class method, on which the error pops up.

def all_addresses_generator():
    for row in session.query(DbNcTransaction).all():
        yield row.to
def nc_balance(self):
    base_url = "https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d?a="
        self.driver.get(base_url + self.address)

    nc_balance = (
        WebDriverWait(self.driver, 20)
            .until(
                EC.presence_of_element_located(
                    (By.CSS_SELECTOR, "#ContentPlaceHolder1_divFilteredHolderBalance")
                )
            )
            .text
    )

    nc_balance = nc_balance.split()[1]
    nc_balance = round(float(nc_balance.replace(",", "")), 2)

    return nc_balance

How can I deal with this?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Kacper
  • 35
  • 1
  • 12

3 Answers3

5

This error message...

2022-03-26T15:25:52.051854+00:00 heroku[worker.1]: Process running mem=543M(104.1%)
2022-03-26T15:25:52.073256+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2022-03-26T15:26:02.875701+00:00 app[worker.1]: [2022-03-26 15:26:02,871: ERROR/ForkPoolWorker-8] Task walletReputation[2cca3c3e-8c58-4983-bbae-e55e52f33c1a] raised unexpected: TimeoutException

...implies that TimeoutException was raised as there was an error initializing ForkPoolWorker-8 as your program exceeded the Memory quota.


Deep Dive

This is a classic example of Out of Memory error where the memory usage have exceeded the maximum level.

Process running mem=543M(104.1%)

Now during the usage of 543M the memory usage is 104.1% and presumably as per the Dyno memory specs you must be using:

free, hobby and standard-1x have 512 MB


Dynos

The Heroku Platform uses the container model to run and scale all the Heroku apps and the containers are called dynos. Dynos are isolated, virtualized linux containers that are designed to execute code based on a user-specified command. Apps can scale to any specified number of dynos based on its resource demands.


Error R14 (Memory quota exceeded)

At times a dyno may require memory in excess of its assigned quota. In those exceptional cases the dyno will page to swap space to continue running which may at times cause degraded process performance. This phenomenon can start generating the R14 error which is calculated by total memory swap, rss and cache as follows:

2011-05-03T17:40:10+00:00 app[worker.1]: Working
2011-05-03T17:40:10+00:00 heroku[worker.1]: Process running mem=1028MB(103.3%)
2011-05-03T17:40:11+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2011-05-03T17:41:52+00:00 app[worker.1]: Working

Resolving R14 memory error

In these scenarios you may like your application to use less memory and you may need to tweak one of the below mentioned factors:

  • number of threads
  • largest possible request
  • the distribution of incoming requests
  • decrease thread count to reduce your memory needs (but this may lower your throughput)
  • add capacity via scaling out e.g. adding additional dynos/servers

Generally adding capacity works perfecto as more servers/dynos comes into operation spreading out the requests and the event that all threads on an individual machine are processing the largest request at the same time is reduced. However in the long run the optimum path to reducing your overall memory requirement is reducing object allocation.


This usecase

In this usecase it seems as per the first code block i.e. def create(id: str) for about 100 ID values to automate building a model for each ID your application is able to scale up but subsequently when you def create_all() you start seeing the error.


Solution

You can adopt a different approach other than creating all the models for each ID in go. If possible divide the ID values to run in batch with each batch containing optimum number of model so the memory usage doesn't crossover the threshhold.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
1

The issue is not (initially) with Selenium raising TimeoutException, but with Heroku raising R14 - Memory quota exceeded error, as shown at the second line of the error log you provided. The RAM usage of your application has exceeded the available quota. Since you are using a free dyno, the maximum RAM (quota) is 512 MB (see here). However, your application - as shown at the first line of the error log (i.e., Process running mem=543M(104.1%)) - requires more than that amount.

Thus, you may try either reducing the number of workers (in case you are using more than one), or reducing the RAM usage of your app, or upgrading to a different Heroku Dyno (see How do I upgrade from Heroku's free tier).

Update

Additionally, it would be preferable to instantiate the WebDriverWait once (at startup), not multiple times (you may also need to increase the timeout value in WebDriverWait):

wait = WebDriverWait(driver, 10)

and then use as:

nc_balance = wait.until(....
Chris
  • 18,724
  • 6
  • 46
  • 80
  • Ok, increase my account to have more RAM and this is not a source of problem. Still i have the same problem but without info about ```Process running mem=543M(104.1%)```. Do you have any other idea? – Kacper Apr 03 '22 at 17:41
1

As simple as this answer might be... It took me long to figure. Heroku cannot process a request for more than 30 seconds. This is why you're getting TimeoutException.

Web requests processed by Heroku are directed to your dynos via a number of Heroku routers. These requests are intended to be served by your application quickly. Best practice is to get the response time of your web application to be under 500ms, this will free up the application for more requests and deliver a high quality user experience to your visitors. Occasionally a web request may hang or take an excessive amount of time to process by your application. When this happens the router will terminate the request if it takes longer than 30 seconds to complete.

The countdown for this 30 second timeout begins after the entire request (all request headers and, if applicable, the request body) has been sent from the router to the dyno. The request must then be processed in the dyno by your application, and a response delivered back to the router, within 30 seconds to avoid the timeout.

Read more https://devcenter.heroku.com/articles/request-timeout

SOLUTION: Use another platform to deploy

Bruno
  • 655
  • 8
  • 18