I have a huge dataframe, and I want to use several columns to apply a custom function, and put the result in a new column. But I have met a problem. The following is my function to calculate the distance between two rows.
def calcDist(p, q):
diff = p - q
square_diff = diff ** 2
sum_square_diff = square_diff.sum()
return sum_square_diff ** 0.5
One of the parameters in the function is constant(a series with 0 and 1), the second parameter of the function is the data in the dataframe which in the selected columns(somthing like a series with 0 and 1). I have tried the following codes.
cols = ['a','b','c']
new = [0,1,1]
df.columns = ['aa','a','b','c','dd','ee']
df['dist'] = df.loc[:,cols].apply(lamda x: calcdist(x, new))
But I get NaN in the 'dist' column. I 've already tried for loop to solve this problem. But it works to slow.
house_chosen['dist'] = 0
for i in range(len(house_chosen)):
cols_chosen = house_chosen.loc[:, addition_list]
series_chosen = cols_chosen.iloc[i, :]
house_chosen.iloc[i, 42] = calcDist(new_house_addition, series_chosen)
So is there any way to solve the problem with apply function? thx