I want to divide my images into smaller windows which will be send to a neural net for training (e.g. for face detectors training). I found tf.extract_image_patches
method in Tensorflow which seemed like exactly what I need. This question explains what it does.
The example there shows input of (1x10x10x1)
(numbers 1
through 100
in order) given the ksize
is (1, 3, 3, 1)
(and strides
(1, 5, 5, 1)
). The output is this:
[[[[ 1 2 3 11 12 13 21 22 23]
[ 6 7 8 16 17 18 26 27 28]]
[[51 52 53 61 62 63 71 72 73]
[56 57 58 66 67 68 76 77 78]]]]
But I'd expect windows like this (of a shape (Nx3x3x1)
, so that it's N
patches/windows of the size 3x3
):
[[[1, 2, 3]
[11, 12, 13]
[21, 22, 23]]
...
So why are all patch values stored in 1D? Does it mean that this method is not meant for the purposes I described above and i can't use it to prepare batches for training? I also found another method for patches extracting, sklearn.feature_extraction.image.extract_patches_2d
and this one really does what I was expecting. So should I understand it like that these two methods don't do the same thing?