1

I am using two dynos (web and worker). web handles requests using a flask app. worker runs basic python code which outputs a .csv file every 5 minutes. This file is quite small (<1MB) .The flask app is supposed to read this .csv file which is required to serve the requests. Question is : what is the most efficient way to do this ? From what I understand dynos are isolated from each other. Secondly, heroku has an ephemeral filesystem (which is ok for my application) because the .csv files need not persist between restarts. Also they need not be backed up. The .csv file written out by the worker is not visible in ls output (after doing heroku run bash). This is most probably because the web and worker are isolated from each other. After spending some time researching the options online, I think there are three options :

(1) Use AWS S3 : Is this a good option for active files ? The files are supposed to be written and read at high frequecy - would'nt S3 be slow for my application. Secondly I want to write (and read back) from within the python code. I am not sure how to do this with S3. It seems that S3 is storage for static files that your application needs.

(2) Postgres : Does the worker and web share this storage ? Can one write a .csv file to this or it has to be SQL ?

(3) Redis : I do not understand this. It is something which uses queues to communicate between the dynos. Can it be used to communicate data ?

My flask application has to read the file only when a request comes in from the front end. Not sure how a Reddis queue can help with this.

Thanks.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
P W
  • 21
  • 3

1 Answers1

0

redis certainly has good queuing features. But at its heart it is a REmote DIctionary Service, hence the name.

Your worker can PUT that csv file into redis, and your web dyno can retrieve it.

r.set('csv', serialized_data);

puts it there, and

serialized_data = r.get('csv'); 

gets it back. 'csv' is the redis key. You can have as many keys as you want within the limits of the heroku redis service (more keys, more data, higher costs).

redis is efficient enough that your web dyno can just get() the data every time it needs it. It will always get the most recent data ( unless the worker hasn't put it there yet for the first time on startup). No need for elaborate caching in your web dyno unless your load is very high.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Thanks for the pointer. Ideally I want to pass the csv as a pandas dataframe. Apparently there is a way to do this using pyarrow. [link](https://stackoverflow.com/questions/57949871/how-to-set-get-pandas-dataframes-into-redis-using-pyarrow/57986261#57986261) – P W Oct 20 '20 at 00:40
  • Yeah, perfect. That kind of serialization is exactly what you want. – O. Jones Oct 20 '20 at 10:35
  • I could serialize and deserialize a pandas dataframe using the information in the link in the first comment. However I am still looking for a solution using either Postgres or S3. Thanks. – P W Oct 20 '20 at 18:24