1

I'm writing a "webapp" for my own personal use, meant to be run with my own computer as a server. It's basically a nice interface for data visualization. This app needs to manipulate large matrices - about 100MB - in Python, and return the computation results to the browser for visualization. Currently I'm just storing the data in a csv file and loading it into pandas every time I want to use it, but this is very slow (about 15 seconds). Is there a way to have this object (a pandas.DataFrame) persist in memory, or does this not make sense?

I tried memcached and I do not think it is appropriate here. I also tried using Redis but reading from the cache is actually the same speed as reading the file if I store each matrix row under its own key, and if I store it all in a string under the same key then reconstructing the array from the string is just as slow as reading it from the csv file. So no gains either way.

jclancy
  • 49,598
  • 5
  • 30
  • 34
  • does the file need to be so large? can you break up the file into multiple ones so that you can 'chunk' load smaller files and load different ones in the background so that when you need a different section, its ready – g19fanatic May 24 '16 at 21:21

1 Answers1

0

Considering the app is supposed to be run on your computer there are 2 options you could try:

  1. Use Werkzeug's FileStorage data structure http://werkzeug.pocoo.org/docs/0.11/datastructures/#werkzeug.datastructures.FileStorage
  2. Get SQLite running in some kind of a RAM filesystem. This will allow data to also be changed faster.
Igor
  • 2,834
  • 2
  • 26
  • 44