1

essentially I'd like to vectorize the for loop below, the np.apply_along_axis was used to speed the comparison. but I'd like to use eliminate also the other for loop. any chance of doing that? the final objective is indeed a mask to be used all at once (inside the loop, it's what I am doing).

import pandas as pd
import numpy as np

original = pd.DataFrame(
    [np.arange(0,500,1), np.arange(1,501,1), np.arange(2,502,1), np.arange(3,503,1), np.arange(4,504,1), ],
    index=list('abcde'),
     columns=pd.to_datetime(pd.date_range('2011-1-1', periods=500))).transpose()

lists = np.array([[0, 1, 2, 3], [2, 3, 4, 5]]) #db_matrix.loc[:, ['a', 'b', 'c', 'd']].as_matrix()

original_m_numpy_repr = original.loc[:, ['a', 'b', 'c', 'd']].as_matrix()

for l in lists:
    mask = np.apply_along_axis(lambda x: np.array_equal(x, l), 1, original_m_numpy_repr)
    print(original[mask])

have any advice on how to apply it ?

Divakar
  • 218,885
  • 19
  • 262
  • 358
Asher11
  • 1,295
  • 2
  • 15
  • 31
  • IIUC you can get the row indices of the matching ones following the answers posted to [`this question`](http://stackoverflow.com/questions/38674027/find-the-row-indexes-of-several-values-in-a-numpy-array) and then simply use those row indices to index into the input array/dataframe for the desired o/p. – Divakar Sep 22 '16 at 14:07
  • *"`np.apply_along_axis` was used to speed the comparison"* `apply_along_axis` is just syntactic sugar - it's unlikely to be faster than a normal Python `for` loop (see [here](http://stackoverflow.com/a/23849233/1461210), for example). – ali_m Sep 22 '16 at 14:24
  • for real? in more than one instance it improved dramatically performance – Asher11 Sep 22 '16 at 14:26

0 Answers0