I think that you can do this with an affine transformation. I am basing this mostly on similar work from https://github.com/VolkerH/Lattice_Lightsheet_Deskew_Deconv. I believe there is an example of this kind of scaling transform in one of their example notebooks. You will need to figure out the shape of the output volume but hopefully this gets you unstuck
import numpy as np
from scipy.ndimage import affine_transform
# create a transformation matrix
scale = np.eye(4)
scale[0, 0] = xy_voxelsize / z_voxelsize
# perform affine transformation
isotropic_vol = affine_transform(anisotropic_vol, np.linalg.inv(scale), output_shape=?, order=1)
I think that you can find some useful functions for determining the output shape from the same repository I mentioned above. This might be a little overkill for what you want but I have had good experience with much of the code there. You could probably figure out the output shape on your own pretty easily.
I'm not sure how many images you have, but these seem pretty big. If you have access to a GPU, the gputools version (https://github.com/maweigert/gputools) of affine transformation could also make your life easier.