Finding Max Shape of Multiple .npz files

Question

I have a number of .npz files that potentially vary in shape and I'd like to find which file has the larges shape. The npzs have 2 arrays within in them, and I'm looking for the largest of the 2nd. The following snippet works, but it takes longer than I expected to return shapes. Is this the most efficient way of achieving this? I'm worried about scaling because it currently takes a couple seconds to find the max shape[1] and I'm only looping through 4 arrays

frameMax =0 
for f in npzs:
    d = np.load(f,mmap_mode='r')
    if d['arr_0'].shape[1]>frameMax:
        frameMax = d['arr_0'].shape[1]
    d=None

score 1 · Accepted Answer · edited May 23 '17 at 12:33

1

Bear in mind that I/O operations might be relatively slow. That said, you can reduce the logic for finding the maximum to the following using the builtin max which will run in O(n) time, and removes the need for the assignments you did:

frameMax = max([np.load(f,mmap_mode='r')['arr_0'].shape[1] for f in npzs])

edited May 23 '17 at 12:33

Community

1
1

answered Oct 13 '16 at 14:47

Moses Koledoye

77,341
8
133
139

You're probably correct and it's most likely an I/O bottleneck. This doesn't really speed anything up, but it definitely looks more sleek. Thanks, @moses-koledoye – nanoPhD Oct 13 '16 at 14:54

Finding Max Shape of Multiple .npz files

1 Answers1