essentially I'd like to vectorize the for
loop below, the np.apply_along_axis
was used to speed the comparison. but I'd like to use eliminate also the other for
loop. any chance of doing that? the final objective is indeed a mask to be used all at once (inside the loop, it's what I am doing).
import pandas as pd
import numpy as np
original = pd.DataFrame(
[np.arange(0,500,1), np.arange(1,501,1), np.arange(2,502,1), np.arange(3,503,1), np.arange(4,504,1), ],
index=list('abcde'),
columns=pd.to_datetime(pd.date_range('2011-1-1', periods=500))).transpose()
lists = np.array([[0, 1, 2, 3], [2, 3, 4, 5]]) #db_matrix.loc[:, ['a', 'b', 'c', 'd']].as_matrix()
original_m_numpy_repr = original.loc[:, ['a', 'b', 'c', 'd']].as_matrix()
for l in lists:
mask = np.apply_along_axis(lambda x: np.array_equal(x, l), 1, original_m_numpy_repr)
print(original[mask])
have any advice on how to apply it ?