Let's say I want to select a value from a different column for each row. Then, I might do something like this:
a = np.arange(12).reshape(3, 4)
columns = np.array([1, 2, 0])
a[np.arange(a.shape[0]), columns]
It seems a bit 'ugly' to me to need to specify the entire range; moreover, even the arange
call takes time:
%timeit np.arange(int(1e6))
1.03 ms ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Is there a way to avoid using arange?
Generalizing the above question; how would one go about selecting not single values, but different adjacent sets of columns (each set of equal size) for each row? I would like to avoid creating many manual aranges, like so:
rows = np.array([0, 2])
start_values = np.array([0, 1])
window_length = 3
column_ranges = np.array(list(map(lambda j: np.arange(j, j + window_length), start_values)))
Right now, the only way I see to use the above column ranges is to index like so:
a[rows, :][:, column_ranges][np.arange(len(rows)), np.arange(len(rows)), :]
Ideally, I'd like to use a notation like a[:, columns]
instead of a[np.arange(a.shape[0]), columns]
, and a[:, columns:columns + window_length]
instead of a[rows, :][:, column_ranges][np.arange(len(rows)), np.arange(len(rows)), :]
.