2

I need to write a program in python that needs to analyze more than 100 MB of data and save it as a matrix of dimensions 100111 * 100111. What should I use? The cells need to hold integer values.

Iguana
  • 247
  • 1
  • 8
  • 3
    you should use `numpy` – jamylak Jun 07 '13 at 11:16
  • For this 76GB matrix you can use numpy. Working with such big matrices however, requires a good understanding which operations don't copy it. Otherwise you will probably run out of memory pretty fast. – Bort Jun 07 '13 at 11:19
  • 1
    If it is a sparse matrix (meaning most of the values are zero), you may want to look into [`scipy.sparse`](http://docs.scipy.org/doc/scipy/reference/sparse.html) – Janne Karila Jun 07 '13 at 11:29
  • 1
    If the data is originally 100 MB or so, why do you want to save a multi-gigabyte transformation of it? (I can imagine some reasons, but I think you'll get better answers if you tell more about the nature of your data and the kind of queries you want to perform on the matrix) – Janne Karila Jun 07 '13 at 11:44
  • take a look [to this answer](http://stackoverflow.com/a/16633274/832621) or [this other one](http://stackoverflow.com/a/16597695/832621) discussing the usage of `numpy.memmap`, to store arrays on disk – Saullo G. P. Castro Jun 07 '13 at 12:08

0 Answers0