I have an apply function that goes through a list of indexes, plugs it into a scikit-learn KNN model, and returns two lists of n size (neighbor distances and neighbor indexes). (Imagine this is for a movie recommendation system).
I want to add these results to a new DF.
Ex: if my function iterates through 3 indexes, and the n-neighbor parameter is 5, I should get a DataFrame with 2 cols, and length 3x5=15.
But currently my script is appending the entire list to one row, as seen below.
This is my code. movies is the DF which has input indexes.
testDF = pd.DataFrame()
def get_distances_indices(index):
distances, indices = model_knn.kneighbors(data[index], n_neighbors=6)
distances = pd.Series(distances.flatten().tolist())
indices = pd.Series(indices.flatten().tolist())
return indices, distances
testDF[['index','distance']] = testDF.append(movies.apply(lambda row: get_distances_indices(row['index']), axis=1).apply(pd.Series),ignore_index=True)
Any help is appreciated. I am a beginner, and saw articles saying using apply here would help speed up the process of getting the list of neighbors.
For sake of simplicity, here is a reproduceable example: I just want the lists/Series to show up in vertical order, not horizontal.
testDF = pd.DataFrame()
moviesData = {'movie': ['The Big Whale', 'Stack Underflow'], 'index': [3, 99]}
movies = pd.DataFrame(data=moviesData)
def get_distances_indices(index):
list1 = [51, 700, 999]
list2 = [.2, .3, .4]
df2 = pd.Series(list1)
df3 = pd.Series(list2)
return df2,df3
testDF[['index','distance']] = movies.apply(lambda row: get_distances_indices(row['index']), axis=1).apply(pd.Series)
testDF.head()