How to compare all values in a column in Pandas.DataFrame with arbitrary value using specific function for comparison?

Asked Jan 11 '22 at 12:52

Active Jan 11 '22 at 13:58

Viewed 53 times

I have DataFrame which has filenames in first column and vector of decimal numbers in second column (which is of Pandas' type Series). DataFrame was loaded from CSV that looks like this:

,filename,vector
0,my-filename,"[1.2 3.1 2.6 ...]"
1,another-filename,"[1.1 3.3 2.2 ...]"
...

I have this function scipy.spatial.distance.correlation(vec1, vec2) and some input vector. I need to compare that input vector with every vector in DataFrame using specified function, and get n most correlated filenames.

Right now I am doing that by iterating over DataFrame, calculating correlations, saving results, sorting them and then taking n most correlated. I have read this answer which basically says that iterating over DataFrame is bad (unless you have very good reason), so I am wondering if there is a better way. I can also adjust arrangement of data in DataFrame if needed.

So, how can one "vectorize" this?

edited Jan 11 '22 at 13:58

asked Jan 11 '22 at 12:52

dosvarog

How to compare all values in a column in Pandas.DataFrame with arbitrary value using specific function for comparison?

0 Answers0