1

I'd like to perform some array calculations using NumPy for a view callable in Pyramid. The array I'm using is quite large (3500x3500), so I'm wondering where the best place to load it is for repeated use.

Right now my application is a single page and I am using a single view callable.

The array will be loaded from disk and will not change.

abroekhof
  • 796
  • 1
  • 7
  • 20
  • Need more details. Are you running this multi-threaded? Is your array a singleton that should be shared between threads and never modified? Are you loading from a file on disk or getting it as the result of some other computation? – Mu Mind Sep 19 '12 at 15:17
  • Answer to your ImportError question: http://stackoverflow.com/questions/8710918/installing-numpy-as-a-dependency-with-setuptools – Mu Mind Sep 19 '12 at 15:23
  • Thanks for the link: answers to your previous comment are in the original question. I don't know about threading, I need to read into that. – abroekhof Sep 19 '12 at 15:28

2 Answers2

3

If the array is something that can be shared between threads then you can store it in the registry at application startup (config.registry['my_big_array'] = ??). If it cannot be shared then I'd suggest using a queuing system with workers that can always have the data loaded, probably in another process. You can hack this by making the value in the registry be a threadlocal and then storing a new array in the variable if one is not there already, but then you will have a copy of the array per thread and that's really not a great idea for something that large.

Michael Merickel
  • 23,153
  • 3
  • 54
  • 70
  • I'm very new to Pyramid (and frameworks in general) so I'm trying to parse your answer. If something is shared between threads, it just needs to never change? If so then yes: the array will be the same for all threads. – abroekhof Sep 19 '12 at 15:25
  • 1
    It's basic multithreading principals at work here. If it is not readonly then you need a locking mechanism to synchronize changes to the array. Most WSGI servers use one thread per request, but some may use extra processes (where the array would not be shared), so if it is not readonly you need to be aware of what's going on. – Michael Merickel Sep 19 '12 at 15:34
2

I would just load it in the obvious place in the code, where you need to use it (in your view, I guess?) and see if you have performance problems. It's better to work with actual numbers than try to guess what's going to be a problem. You'll usually be surprised by the reality.

If you do see performance problems, assuming you don't need a copy for each of multiple threads, try just loading it in the global scope after your imports. If that doesn't work, try moving it into its own module and importing that. If that still doesn't help... I don't know what then.

Mu Mind
  • 10,935
  • 4
  • 38
  • 69
  • A 3500x3500 array of floats (64 bit) is going to be about 100Mb in size. One could guess that loading it from disk in the view function (on every request) is not going to be super-fast... – Sergey Sep 20 '12 at 00:37