The following is reproducible and returns the desired results.
import pandas as pd, numpy as np
np.random.seed(3124)
x = 10 + np.random.rand(10)
y = np.split(10 + np.random.rand(100), 10)
x >= y
# array([[False, True, True, False, False, False, False, True, False, True],
# ...
# [False, True, True, True, False, True, False, True, False, False]])
np.apply_along_axis(np.greater_equal, 0, x , y)
# same results as x >= y.
However, if x and y from above were from above were pulled out of a pandas data frame, I have to convert the pandas series of arrays to a list of arrays. This is very computationally expensive for a large series.
How would I complete this in a more efficient way?
df = pd.DataFrame({'x':x,'y':y})
df['x'].values >= df['y'].tolist()
# same results as above.
df['x'] >= df['y']
# ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
df['x'].values >= df['y'].values
# ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Edit
@Divakar gave the correct answer to the question above. However, in my actual use case the arrays in y
will all be different lengths.
Using y
from above to create y2
which is closer to my data. The following is reproducible.
y2 = [np.resize(a, r) for a,r in zip(y,np.random.randint(2, 10, 10))]
# yields something like:
# [array([10.1269906 , 10.34269353, 10.39461373, 10.022271 , 10.69316165, 10.83981557, 10.03328485, 10.56850597]),
# array([10.99159117, 10.21215159, 10.65208435, 10.22483111, 10.13748229, 10.72621328]),
# ...
# array([10.61071355, 10.62141997]),
# array([10.3899659 , 10.66207985, 10.85937807]),
# array([10.38374303, 10.93140162, 10.88535643, 10.51529231, 10.60723795, 10.60504599, 10.6773523 ]),
# array([10.02775067, 10.91382588, 10.31222259, 10.44732757, 10.16980452, 10.88914854, 10.22677905])]
The following returns the results I want, but is not feasible for the size of my actual data frame. I would rather do it in a vectorized form with numpy.
[x[i] >= y2[i] for i in range(len(y2))]
# returns
# [array([False, False, False, False, False, False, False, False]),
# array([False, True, False, True, True, False]),
# ...
# array([ True, True]),
# array([ True, False, False]),
# array([False, False, False, False, False, False, False]),
# array([ True, True, True, True, True, True, True])]