Append series results from apply() to new DataFrame?

Question

I have an apply function that goes through a list of indexes, plugs it into a scikit-learn KNN model, and returns two lists of n size (neighbor distances and neighbor indexes). (Imagine this is for a movie recommendation system).

I want to add these results to a new DF.

Ex: if my function iterates through 3 indexes, and the n-neighbor parameter is 5, I should get a DataFrame with 2 cols, and length 3x5=15. But currently my script is appending the entire list to one row, as seen below.

This is my code. movies is the DF which has input indexes.

testDF = pd.DataFrame()

def get_distances_indices(index):

    distances, indices = model_knn.kneighbors(data[index], n_neighbors=6)

    distances = pd.Series(distances.flatten().tolist())
    indices = pd.Series(indices.flatten().tolist())

    return indices, distances

testDF[['index','distance']] = testDF.append(movies.apply(lambda row: get_distances_indices(row['index']), axis=1).apply(pd.Series),ignore_index=True)

Any help is appreciated. I am a beginner, and saw articles saying using apply here would help speed up the process of getting the list of neighbors.

For sake of simplicity, here is a reproduceable example: I just want the lists/Series to show up in vertical order, not horizontal.

testDF = pd.DataFrame()
moviesData = {'movie': ['The Big Whale', 'Stack Underflow'], 'index': [3, 99]}
movies = pd.DataFrame(data=moviesData)

def get_distances_indices(index):
    list1 = [51, 700, 999]
    list2 = [.2, .3, .4]
    df2 = pd.Series(list1)
    df3 = pd.Series(list2)

    return df2,df3

testDF[['index','distance']] = movies.apply(lambda row: get_distances_indices(row['index']), axis=1).apply(pd.Series)
testDF.head()

Please take a look at [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). We don't really care where the data comes from. We need small sample datastructures that we can copy and paste into our interpreters and the desired output datastructure. — timgeb, May 06 '20 at 19:39
@timgeb I have added a reproducible example, let me know if I should add anything else. Thanks — AxW, May 06 '20 at 20:02

score 1 · Accepted Answer · answered May 06 '20 at 21:21

You could try something like this:

...

def get_distances_indices(index):
    list1 = [51, 700, 999]
    list2 = [.2, .3, .4]

    # return a dictionary
    return {'index':list1, 'distance':list2}

d = movies.apply(lambda row: get_distances_indices(row['index']), axis=1)

# flatten the resulting lists
l1 = [item for sublist in [x['index'] for x in d] for item in sublist]
l2 = [item for sublist in [x['distance'] for x in d] for item in sublist]

data_tuples = list(zip(l1,l2))
pd.DataFrame(data=data_tuples, columns=['index', 'distance'], index=None,)

If I understood your question correctly, this should give you your desired result:

index   distance
0   51  0.2
1   700 0.3
2   999 0.4
3   51  0.2
4   700 0.3
5   999 0.4

I believe this is what I was looking for, thank you. – AxW May 07 '20 at 19:43 — AxW, May 07 '20 at 19:43

Append series results from apply() to new DataFrame?

1 Answers1