Suppose I have a very big np.array
with N elements and I want to select only some values which pass S selectios. The usual way is:
selected_items = original_array[selection1(original_array) & original_array > 3]
this is fine but a lot of temporary memory is used. If I am correct I need S masks of booleans of size N, plus at least another one for the &
result. Is there a better solution in terms of memory usage? For example an explicit loop don't need this:
selected_items = []
tests = (selection1, lambda x: x > 3)
for x in orignal_items:
if all( (t(x) for t in tests) ):
selected_items.append(x)
I like numpy, but its design is really memory eager, so it seems not suitable for processing of big data. On the other hands an explicit loop in python is not very performant.
Is there a solution with numpy?
Are there other python based framework for big data analysis?