I have searched around a bit for tutorials etc. to help with this problem but cant seem to find anything.
I have two lists of n-dimensional numpy arrays (3D array form of some images) and am wanting to check for overlapping images within each list. Lets says list a is a training set and list b is a validation set.
One solution is just to use a nested loop and check if each pair of arrays is equal using np.array(a[i], b[j])
but that is slow (each list has about 200,000 numpy arrays in it) and frankly quite disgusting.
I was thinking a more elegant way of achieving this would be to hash each of the numpy arrays within each list and then compare each entry using these hash tables.
Firstly, is this solution correct, and secondly, how would I go about achieving this?
An example of some of the data is below.
train_dataset[:3]
array([[[-0.5 , -0.49607843, -0.5 , ..., -0.5 ,
-0.49215686, -0.5 ],
[-0.49607843, -0.47647059, -0.5 , ..., -0.5 ,
-0.47254902, -0.49607843],
[-0.49607843, -0.49607843, -0.5 , ..., -0.5 ,
-0.49607843, -0.49607843],
...,
[-0.49607843, -0.49215686, -0.5 , ..., -0.5 ,
-0.49215686, -0.49607843],
[-0.49607843, -0.47647059, -0.5 , ..., -0.5 ,
-0.47254902, -0.49607843],
[-0.5 , -0.49607843, -0.5 , ..., -0.5 ,
-0.49607843, -0.5 ]],
[[-0.5 , -0.5 , -0.5 , ..., 0.48823529,
0.5 , 0.1509804 ],
[-0.5 , -0.5 , -0.5 , ..., 0.48431373,
0.14705883, -0.32745099],
[-0.5 , -0.5 , -0.5 , ..., -0.32745099,
-0.5 , -0.49607843],
...,
[-0.5 , -0.44901961, 0.1509804 , ..., -0.5 ,
-0.5 , -0.5 ],
[-0.49607843, -0.49607843, -0.49215686, ..., -0.5 ,
-0.5 , -0.5 ],
[-0.5 , -0.49607843, -0.48823529, ..., -0.5 ,
-0.5 , -0.5 ]],
[[-0.5 , -0.5 , -0.5 , ..., -0.5 ,
-0.5 , -0.5 ],
[-0.5 , -0.5 , -0.5 , ..., -0.5 ,
-0.5 , -0.5 ],
[-0.5 , -0.5 , -0.49607843, ..., -0.5 ,
-0.5 , -0.5 ],
...,
[-0.5 , -0.5 , -0.5 , ..., -0.48823529,
-0.5 , -0.5 ],
[-0.5 , -0.5 , -0.5 , ..., -0.5 ,
-0.5 , -0.5 ],
[-0.5 , -0.5 , -0.5 , ..., -0.5 ,
-0.5 , -0.5 ]]], dtype=float32)
I appreciate the help in advance.