I know that questions along these lines have been asked before, but all of them seem to be outdated. If I run the code:
import numpy as np
a = np.random.randn(10000, 10000)
b = np.random.randn(10000, 10000)
c = a*b
htop shows that only one core is being used at more or less 100% capacity. So, why doesn't numpy speed up processing by using more than one thread? Is this caused by my instalation of numpy? Or is it because of some dependent library?
I am running this on Fedora 36.
I tried to implement multithreading manually, by splitting arrays and performing operations asynchronously, but the results I got were simply incorrect. Reading through the aforementioned old questions, maybe this has something to do with the BLAS library or its environment variables, but this looks outdated to me: I couldn't find this library in my machine, nor the environment variables.
Any help would be sincerely appreciated.