1

I have a number of .npz files that potentially vary in shape and I'd like to find which file has the larges shape. The npzs have 2 arrays within in them, and I'm looking for the largest of the 2nd. The following snippet works, but it takes longer than I expected to return shapes. Is this the most efficient way of achieving this? I'm worried about scaling because it currently takes a couple seconds to find the max shape[1] and I'm only looping through 4 arrays

frameMax =0 
for f in npzs:
    d = np.load(f,mmap_mode='r')
    if d['arr_0'].shape[1]>frameMax:
        frameMax = d['arr_0'].shape[1]
    d=None
nanoPhD
  • 400
  • 4
  • 16

1 Answers1

1

Bear in mind that I/O operations might be relatively slow. That said, you can reduce the logic for finding the maximum to the following using the builtin max which will run in O(n) time, and removes the need for the assignments you did:

frameMax = max([np.load(f,mmap_mode='r')['arr_0'].shape[1] for f in npzs])
Community
  • 1
  • 1
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
  • You're probably correct and it's most likely an I/O bottleneck. This doesn't really speed anything up, but it definitely looks more sleek. Thanks, @moses-koledoye – nanoPhD Oct 13 '16 at 14:54