Regarding the part of your question
I'm not sure it's working properly anyways
I believe you have a bug on the line where you check for an exact match:
if np.equal(valid_dataset, train_dataset[img]).any(1).all()
For illustration purposes, let's consider a toy example where all images are 5x5 and there are only 3 images in the valid_dataset
. Let's go through possible steps of your check when the image is contained in the valid_dataset
, e.g. let's check the 2nd image from the valid_dataset
. In this case np.equal(valid_dataset, train_dataset[img])
might give us, e.g.,
[[[ True False True False False]
[False False True False False]
[False False True False False]
[False False False True False]
[False False False False False]]
[[ True True True True True]
[ True True True True True]
[ True True True True True]
[ True True True True True]
[ True True True True True]]
[[ True True True True False]
[ True True True False True]
[ True True False True False]
[ True False True False False]
[False True False False True]]]
Next, you apply .any(1)
to this 3D result. This operation looks at projections on the 2nd dimension and tells us whether or not at least one value is True
. Hence, the result of applying .any(1)
would have shape 3x5, e.g., the value on the position [0,0] is a logical OR of the following values from our example
[[[ True ..... ..... ..... .....]
[False ..... ..... ..... .....]
[False ..... ..... ..... .....]
[False ..... ..... ..... .....]
[False ..... ..... ..... .....]]
[...]
[...]]
Thus, in the case of our toy example the result of applying .any(1)
would be
[[ True False True True False]
[ True True True True True]
[ True True True True True]]
Applying .all()
to that would result into False
even though the image is contained in the valid_data
.
Correct solution:
What you want to do is to check that all pixels of the tested image are the same as pixels of at least one image in the valid_dataset
. In other words, for the match (i.e. for the condition to be True
) you require that in the valid_dataset
all values projected to the plane given by the 2nd and the 3rd dimensions are the same as values of the tested image pixels for at least one valid_dataset
image. Therefore, the condition should be
if np.equal(valid_dataset, train_dataset[img]).all(axis = (1,2)).any()