1

I have a situation where I have multiple sources that will need to read from the same (small in size) data source, possibly at the same time. For example, multiple different computers calling a function that needs to read from an external data source (e.g. excel file). Since it multiple different sources are involved, I cannot simply read from the data source once and pass it into the function---it must be loaded in the function.

Is there a data source that can handle this effectively? A pandas dataframe was an acceptable format for information that need to be read so I tried storing that dataframe in an sqlite3 databases since according to the sqlite3 website, sqlite3 databases can handle concurrent reads. Unfortunately, it is failing too often. I tried multiple different iterations and simply could not get it to work.

Is there another data format/source that would work/be effective? I tried scouring the internet for whether or not something as simple as an excel file + the pandas read_excel function could handle this type of concurrency but I could not find information. I tried an experiment of using a multiprocessing pool to simultaneously load the same very large (i.e. 1 minute load) excel file and it did not crash. But of course, that is not exactly a perfect experiment.

Thanks!

user56628
  • 121
  • 1

1 Answers1

0

You can try using openpyxl's read-only mode. It uses generator instead of loading whole file.
Also take a look at processing large xlsx file in python