I have two dataframes let's call first one df and the second one compare_df: First one is like this:
Date cell tumor_size (assume it is three dimensional)
25/10/2015 113 [51, 52, 55]
22/10/2015 222 [50, 68, 22]
22/10/2015 883 [45, 23, 67]
20/10/2015 334 [35, 23, 76]
and second one is like that:
Date cell tumor_size
19/10/2015 564 [47, 23, 56]
19/10/2015 123 [56, 11, 23]
22/10/2014 345 [36, 66, 78]
13/12/2013 456 [44, 21, 83]
For each row in the dataframe I want to go through each row in the second dataframe and record the euclidean distances then get the minimum one. This is my code tries to accomplish this:
# These will be our lists of pairs and size differences.
pairs = []
diffs = []
for row in df.itertuples():
compare_df['distance'] = np.linalg.norm(compare_df.tumor_size - row.tumor_size) # I get error for this line
row_of_interest = compare_df.loc[compare_df.distance == compare_df.distance.min()]
pairs.append(row_of_interest.cell.values[0])
diffs.append(row_of_interest.distance.values[0])
df['most_similar_to'] = pairs
df['similarity'] = diffs
However I get:
ValueError: Length of values does not match length of index
Although size of the vectors are the same, and I drop Nan
values. Any ideas?