I work for a retail organization. We are developing a control tool which will retrieve transactional data and displays 3 sites which are performing closely similar to a test site which we have chosen based on a chosen parameter(example :sales). This is currently working fine when the code is developed in R Shiny, connected to the On Prem database and executed from personal laptops. The requirement is to host this tool on Azure so others can access it easily.
We will be rewriting this tool using python and are planning to retrieve the csv data (which we have already ingested) from a datalake gen1.
Panda dataframe can only be created from local files and not directly from storage. This means we have to download the csv from the datalake, convert it to a pandas dataframe and then execute our python algorithms. When the files are downloaded to local storage, this would occupy disk storage on the app service which is limited to 250GB. The files can be are fairly large (>5GB) and there can be multiple of them. We will also have multiple users accessing this tool. I would think that the disk storage will get filled up fairly quickly.
Is there any way to automatically clear of the temp storage at regular intervals automatically? Should this be managed in the code itself after each execution.