My Numpy array contains 10 columns and around 2 million rows.
Now I need to analyze each column separately, find values which are outliers; and delete the entire corresponding row from the array.
So I'd start analyzing column 0; find outliers at Row 10,20,100; and remove these rows. Next I'd start analyzing column 1 in the now trimmed array; and apply the same process.
Of course I can think of a normal manual process to do this (iterate through each column, find indices which are outliers, delete row, proceed to other column), but I've always found that Numpy contains some quick nifty tricks to accomplish statistical tasks like these.
And if you could elaborate a bit on the runtime cost of the method; even better.
I'm not restricted to the NumPy library here, if SciPy has something helpful then no issues using it.
Thanks!