2

I'm building a Web-App that is fetching data from an API and displaying it. For that im using Flask and the request library. Because the API is not well layed out, i need to make a bunch of API calls to get all the data i need.

Here is how the simplified folder structure looks like:

app.py
api/
  api.py

To not overload the api and sending hundreds of api requests on every GET requests, i tried to implement a function that fetches the data on webserver start, stores it into a variable and refreshes the data after a specific time. Here is a simplified api class and refresh function

"""
The API class gets initizialized on webserver start
"""
class API:
    def __init(self):
        self.API_KEY = 'xxx-xxx'
        self.BASE_URL = 'https://xxxxxxxx.com/3'
        self.HEADER = {
            'X-Api-Key': f'{self.API_KEY}',
            'Accept': 'application/json'
        }

    self.session = requests.session()
    self.session.headers.update(self.HEADER)

    self.data = {}
    self.refresh_time = 900 # how long the function should wait until next refresh

    threading.Thread(target=refresh_data).start()


def refresh_data(self):
    while True:
        self._refresh() # function that fetches the data from the API and stores/refreshes the in the self.data json
        time.sleep(self.refresh_time)

I know its probably not the best way how to handle this, but in my venv it works without problems.

If i make this webapp production ready > deploying it to Windows IIS with wFastCGI the webserver gets restartet randomly ( i didnt noticed any pattern ) and so the api class gets initizialized multiple times meaning the refresh function gets started multiple times.

Here is some logging of the webserver:

2023-06-05 07:54:29,298 [MainThread  ] [            <module>()] [INFO ]  Setting up APIs...         # Log from webserver
2023-06-05 07:54:29,299 [MainThread  ] [            __init__()] [DEBUG]  API Class init             > debug log in API class
2023-06-05 07:54:29,377 [MainThread  ] [               index()] [INFO ]  GET from 192.168.18.125    # GET request 
2023-06-05 07:54:30,001 [MainThread  ] [            <module>()] [INFO ]  Setting up APIs...         # Log from webserver
2023-06-05 07:54:30,001 [MainThread  ] [            <module>()] [INFO ]  Setting up APIs...         # Log from webserver
2023-06-05 07:54:30,001 [MainThread  ] [            __init__()] [DEBUG]  API Class init             > 
2023-06-05 07:54:30,001 [MainThread  ] [            __init__()] [DEBUG]  API Class init             > debug log from the same API class
2023-06-05 07:54:30,002 [Thread-1 (_s] [        refresh_data()] [INFO ]  Checking data...           
2023-06-05 07:54:30,002 [Thread-1 (_s] [        refresh_data()] [INFO ]  Checking data...
2023-06-05 07:54:30,006 [Thread-1 (_s] [            _refresh()] [INFO ]  Refreshing data...
2023-06-05 07:54:30,007 [Thread-1 (_s] [       get_something()] [INFO ]  Getting data...

I already did some research maybe this helps.

  1. wfastcgi github question so i thought because im writing the logs to a file in the webserver folder the server gets restarted, so i wrote logs outside the folder but the server kept restarting ( i also tried to edit the web.config but nothing worked for me )
  2. Microsoft dev network question a similar question i found

Can anyone explain this behavior to me? I would appriciate it if there are any suggestions how to handle a timed api call or in other words queue.

EDIT:

I found out that the IIS has a load balancing feature, which can load a website ( or web app ) on demand or let the website always running.

Here is what i found IIS - "Always On" Application Pool

But the features has no impact on the wFastCGI, the application is still restarting.

LiiVion
  • 312
  • 1
  • 10
  • I haven't used FastCGI before (but have some experience with Gunicorn & Cgi). From what I read so far, it seems like an improvement over CGI which spun up a new process for each request to utilising multiple requests with one process. Moreover, restarts are in the nature of web workers (to handle memory leaks, or failures). Wouldn't it be best to cache your API responses in a separate layer? (file/redis or something else) – gavin Jun 11 '23 at 08:25
  • I have no experience with radis but i already tried to export the data to a file with a timestamp. But the problem with that is, when the threads gets initialized like in the same couple of miliseconds (as seen in log above) they both start up (because the timestamp isnt set from one yet) and double sending api requests as well as double writing to the file. – LiiVion Jun 11 '23 at 13:25
  • You can take locks on refreshing the cache and/or making an API call if you want to continue down the file road. – gavin Jun 11 '23 at 16:29
  • After refreshing data, you could *pickle* it to a file. On startup get the pickled data if the file's timestamp is "recent." As an aside: You say you need to make *multiple* API calls to refresh the data. Then presumably while you have made one or more of these calls but not all of them the data might be in an inconsistent state. If so, you need to hold in abeyance such user requests while the data is being refreshed. You might wish to see [this article](https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock). – Booboo Jun 12 '23 at 11:15

2 Answers2

0

Why do you want to use the web server's memory to hold data? This pattern is not recommended, as web servers may restart, and data can be lost.

Why not use a cache like Redis or a Database and set the API to pull data/return it to the client?

To ensure the cache/database layer has the latest data, you can set a separate cron job that regularly runs to fetch the data from the external API and populate your data store.

This will avoid the headache of using your web server as a data store.


Now, if you still want to go down this way, ensure your Flask app is not hot-loading/watching for file changes that may happen if you are running on Debug mode. Ensure you use the Production configuration.

Another area to explore is the IIS/FastCGI settings. Are you watching for file changes in the root directory or any? This could be the cause of the restarts.

0

After various attempts and recommendations to use some kind of cache/file export i implemented caching to the webapp and since then it works great.

I already used a session for my api requests and therefore, i simply changed from a normal session to a cached session from requests_cache

Here is an example what i did:

from requests_cache import CachedSession

class Api:
    def __init__(self):
        self.API_KEY = 'xxx-xxx'
        self.BASE_URL = 'https://xxxxxxxx.com/3'
        self.HEADER = {
            'X-Api-Key': f'{self.API_KEY}',
            'Accept': 'application/json'
        }

        # Session cache setup ( when data expires )
        self.default_expire_after = 900
        self.urls_expire_after = {
            f'{self.BASE_URL}/endpoint1/': 900,
            f'{self.BASE_URL}/endpoint2/': 1800,
            f'{self.BASE_URL}/endpoint1': 3600
        }


        # Session that creates a cache file in the root dir in sqlite format
        self.session = CachedSession('cache',
                                     backend='sqlite',
                                     expire_after=self.default_expire_after,
                                     urls_expire_after=self.urls_expire_after)
        self.session.headers.update(self.HEADER)

The API requests with all the data are getting stored in the cache and when the data expires the session sends out a new api request. If the data isnt expired it takes everything from the cache.

This has two major improvements:

  • reduce load on the api servers by sending less api requests
  • faster response times, since the session fetches the data from the cache and does not need to wait for the api to respond
LiiVion
  • 312
  • 1
  • 10
  • Won't the cache be deleted during the restart of the webserver? The documentation for the requests_cache library also mentions persistence using an [external data store](https://requests-cache.readthedocs.io/en/stable/user_guide/backends.html). You seem to be still relying on the [in-memory](https://requests-cache.readthedocs.io/en/stable/modules/requests_cache.backends.base.html#requests_cache.backends.base.BaseCache) implementation, which will be lost each restart. – user7434398 Jun 13 '23 at 13:00
  • No, the cache wont be deleted because i used a `backend` in the CachedSession. In my example i used `SQLite` > this creates an `.sqlite` file where everything is cached with timestamp, data, headers etc. in sqlite format. There is also the option to use `memory` as backend which would be deleted on restart – LiiVion Jun 14 '23 at 06:06