0

I have a huge dataframe, and I want to use several columns to apply a custom function, and put the result in a new column. But I have met a problem. The following is my function to calculate the distance between two rows.

def calcDist(p, q):
    diff = p - q
    square_diff = diff ** 2
    sum_square_diff = square_diff.sum()
    return sum_square_diff ** 0.5

One of the parameters in the function is constant(a series with 0 and 1), the second parameter of the function is the data in the dataframe which in the selected columns(somthing like a series with 0 and 1). I have tried the following codes.

cols = ['a','b','c']
new = [0,1,1]
df.columns = ['aa','a','b','c','dd','ee']
df['dist'] = df.loc[:,cols].apply(lamda x: calcdist(x, new)) 

But I get NaN in the 'dist' column. I 've already tried for loop to solve this problem. But it works to slow.

house_chosen['dist'] = 0
for i in range(len(house_chosen)):
    cols_chosen = house_chosen.loc[:, addition_list]
    series_chosen = cols_chosen.iloc[i, :]
    house_chosen.iloc[i, 42] = calcDist(new_house_addition, series_chosen)

So is there any way to solve the problem with apply function? thx

Chunk_Ning
  • 113
  • 2
  • 8
  • 1
    Ca you add small data sample? – jezrael Oct 04 '17 at 07:14
  • @jerzael, the data to calculate are two series with 0 and 1. I just want to calculate the distance between them. But one parameter is in a dataframe. The dataframe consitants 43 colmuns. So how could I add a small data sample? – Chunk_Ning Oct 04 '17 at 07:22
  • If you want to calculate the value for each row you should add axis=1 parameter to your apply method. – AndreyF Oct 04 '17 at 07:47
  • @AndreyF,ya, I forget add axis=1.Thx – Chunk_Ning Oct 04 '17 at 07:56
  • 1
    @Chunk_Ning - I believe [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) should help. – jezrael Oct 04 '17 at 08:40

0 Answers0