I have two arrays
n1 = pd.Series([1,2,3, np.nan, np.nan, 4, 5], index=[3,4,5,6,7,8,9])
n2 = pd.Series([np.nan, np.nan, 4, 5, 3,], index=[2, 4, 5, 10, 11])
the data format is like following and the last column is the result I want to get:
index n1 n2 resultexpected(n1<n2)
2 na na
3 1 na
4 2 na na
5 3 4 True
6 na na
7 na na
8 4 na
9 5 na
10 5 na
11 11 na
Here is my solution and it is very inefficient.
n1 = pd.Series([1,2,3, np.nan, np.nan, 4, 5], index=[3,4,5,6,7,8,9])
n2 = pd.Series([np.nan, np.nan, 4, 5, 3,], index=[2, 4, 5, 10, 11])
def GT(n1, n2):
n1_index = n1.index.values
n2_index = n2.index.values
index = np.sort(list(set(list(n1_index) + list(n2_index))))
new_n1 = pd.Series(np.nan, index=index)
new_n1.loc[n1_index] = n1.values
new_n2 = pd.Series(np.nan, index=index)
new_n2.loc[n2_index] = n2.values
output = pd.Series(new_n1.values < new_n2.values, index=index)
output.loc[n1[n1.isnull()].index] = np.nan
output.loc[n2[n2.isnull()].index] = np.nan
return output
starttime = datetime.datetime.now()
for i in range(500):
GT(n1, n2)
endtime = datetime.datetime.now()
print(endtime - starttime)
My rough idea is to rebuild two arrays with identical index list and compare them. But the currently solution is very slow. The for loop is what I use to test the computation cost.
The difficult point to me is how to efficiently compare the two values at the same index, and what's the best way to nullify the output result if there isn't a value in array n1
or n2
.
Is there any better solutions please? Especially, time efficient way.