This post helped to achieve what I wanted but the implementation takes longer for some large datasets I work onNumPyhave two NumPy arrays (fairly large):
p[:24]=array([[ 0.18264738, -0.00326727, 0.01799096],
[ 0.18198644, -0.00051316, 0.01800063],
[ 0.18999948, 0. , 0.0226188 ],
[ 0.18215604, 0.00157497, 0.01799999],
[ 0.18286349, 0.0036474 , 0.01799824],
[ 0.18999948, 0. , 0.0226188 ],
[ 0.18399446, 0.00528562, 0.01799998],
[ 0.18573835, 0.0068323 , 0.01799908],
[ 0.18999948, 0. , 0.0226188 ],
[ 0.18573835, 0.0068323 , 0.01799908],
[ 0.18744153, 0.00758001, 0.018 ],
[ 0.18999948, 0. , 0.0226188 ],
[ 0.18744153, 0.00758001, 0.018 ],
[ 0.18956973, 0.00801727, 0.01800126],
[ 0.18999948, 0. , 0.0226188 ],
[ 0.19157426, 0.0078435 , 0.018 ],
[ 0.19366005, 0.00714792, 0.01800038],
[ 0.18999948, 0. , 0.0226188 ],
[ 0.18999948, 0. , 0.0226188 ],
[ 0.19584496, 0.0055142 , 0.01799665],
[ 0.19701494, 0.00384344, 0.01800058],
[ 0.19366005, 0.00714792, 0.01800038],
[ 0.19584496, 0.0055142 , 0.01799665],
[ 0.18999948, 0. , 0.0226188 ]]
v[:24]=array([[ 0.18264738, -0.00326727, 0.01799096],
[ 0.18198644, -0.00051316, 0.01800063],
[ 0.18999948, 0. , 0.0226188 ],
[ 0.18215604, 0.00157497, 0.01799999],
[ 0.18286349, 0.0036474 , 0.01799824],
[ 0.18399446, 0.00528562, 0.01799998],
[ 0.18573835, 0.0068323 , 0.01799908],
[ 0.18744153, 0.00758001, 0.018 ],
[ 0.18956973, 0.00801727, 0.01800126],
[ 0.19157426, 0.0078435 , 0.018 ],
[ 0.19366005, 0.00714792, 0.01800038],
[ 0.19584496, 0.0055142 , 0.01799665],
[ 0.19701494, 0.00384344, 0.01800058],
[ 0.19775054, 0.0019907 , 0.01800372],
[ 0.19800517, -0.00065405, 0.01800135],
[ 0.19731225, -0.00330035, 0.01799999],
[ 0.19596213, -0.00537427, 0.01800001],
[ 0.18937038, -0.00797523, 0.018 ],
[ 0.18739267, -0.00759293, 0.01799974],
[ 0.18565072, -0.00671446, 0.018 ],
[ 0.18411626, -0.00545196, 0.01800367],
[ 0.19136006, -0.00791202, 0.01799961],
[ 0.1938769 , -0.00702934, 0.01799973],
[ 0.1314003 , -0.06724723, 0.0645 ]])
v array is generated from p array using:
p_uniques, p_indices, p_inverse, p_counts = np.unique(
p, return_index=True,
return_inverse=True,
return_counts=True,
axis=0)
v = p[np.sort(p_indices, axis=None)]
Now, the target is to generate an array containing the indices/occurrences of elements of the v array in the p array including duplicates. Therefore, the desired output would be:
indices[:24]=array([ 0, 1, 2, 3, 4, 2, 5, 6, 2, 6, 7, 2,
7, 8, 2, 9, 10, 2, 2, 11, 12, 10, 11, 2])
I just posted the first 24 indices from the indices array to save space.
I tried various methods using np.where, np.isin, and others but I could not achieve the desired result with better performance over the solution shared in the linked post.
I'd greatly appreciate your help.