0

Since my feature matrix was too large, I used np.savez to to compress it. The resulting npy file values' as follows: indptr: 1, 21, 201, 219, 262, 285 ... indices: 125, 6, 921, 493, 218, 824 ...

I think if an element of indices is lower than the previous element, we are in next row due to csr data read direction. Specifically, in indices, 6 is lower than 125. So the second data should be in the next row but indptr states that the second row start with the 21st data. What is the possible problem?

Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • The indices within a row do not have to be sorted. Attribute `,has_sorted_indices` is True if they are sorted, False otherwise. There are methods to `.sort_indices`. These apply, of course to the matrix before saving. – hpaulj Apr 17 '17 at 23:12
  • The newest `scipy` has a `save_npz` method,`http://stackoverflow.com/questions/43014503/save-npz-method-missing-from-scipy-sparse. It uses `savez`, probably like you do. Check its code. – hpaulj Apr 17 '17 at 23:16
  • Alright, so what I understand is that if indices are not sorted, data is not sorted in a row as well. Thus, data and indices still match. Am I correct? –  Apr 18 '17 at 04:00
  • Yes, they go together. – hpaulj Apr 18 '17 at 04:07
  • Thank you very much. I solved the problem. Can you help me with this issue as well: http://stackoverflow.com/questions/43471419/matlab-crashes-during-importing-large-non-formatted-data –  Apr 18 '17 at 11:57

0 Answers0