1

I want to execute a function that would run in parallel to the fast api.

That function basically fetch the data from the web and writes it to the database, the data could be million of the records and could took more than 5 minutes.

The API accepts the data and query the db and response to the client.

What I am currently doing right now is

@app.on_event("startup")
async def startup_event():
    
    await Extractor().callit()
class Extractor:
    async def callit(self):
        # Insertion of the tor ips with the data models
        indicators = []
        indicators.clear()
        for ip in await scrap_web_ips():
            if ip == '':
                continue
            q_indicator = self.check_ip_status(ip)
            if q_indicator is None:
                indicator = Indicators(indicator=ip, tor=True,
                                       last_seen=get_current_date_in_iso(),
                                       first_seen=get_current_date_in_iso(),
                                       white_list=True if self.get_vt_stats(ip) == 0 else False)
                indicators.append(indicator)
            else:
                print("Updating the lastseen of the indicator.")
                q_indicator.last_seen = get_current_date_in_iso()
                q_indicator.tor = True
                self.session.commit()
    

         print(self.insert_data(indicators))

And than my APIs.

The function is blocking the api calls.

I don't know how to handle this now.

Root
  • 955
  • 1
  • 16
  • 39
  • I can see you await the asynchrnous function `scrap_web_ips()`. Does this function have a return statement or a yield statement? If it has a yield statement, it becomes a asynchronous generator and then this would probably work without being blocking. However, if it has a return statement, the entire codeblock will wait for the variable to form before the for loop starts looping it. Forming the variable will be none blocking but then looping over each 'ip' will be blocking for the entire thing. – JarroVGIT Feb 04 '21 at 12:18
  • 1
    Better choice would be use Celery or some lightweight task scheduler.. once FastAPI starts kick a job and handle it to celery or other task scheduler..now onwards, it’s not your FastAPI responsibility to scrape and store data.. This approach will help to serve your users better than earlier.. – hackwithharsha Feb 07 '21 at 02:55
  • Does this answer your question? [FastAPI python: How to run a thread in the background?](https://stackoverflow.com/questions/70872276/fastapi-python-how-to-run-a-thread-in-the-background) – Chris Apr 25 '23 at 05:40

1 Answers1

0

It is likely that the callit() function is actually not an asynchronous function at all. You state it collects something from the web and stores it in a database. Those two actions are prime candidates to use an asynchronous approach with. This would look something like this:

async def callit():
1  result = await get_from_web_async()
2  await store_in_db_async(result)

On line 1 it will yield back the control while it waits for the result of get_from_web_async(), while on line 2 it would again yield it back when storing it in the database.

But, without your code on the callit() function, this would just be a guess :)

EDIT: Based on the additional information, the working theory is that the call to scrap_web_ips() does not return a async generator. Therefor, the entire result is awaited before looped over. While the call is not blocking while awaiting the result, the loop it is used in will be completely blocking.

JarroVGIT
  • 4,291
  • 1
  • 17
  • 29