0

Overview: I am creating a recommendation system that compares a course already taken by a student to a catalog of available courses the student has not yet taken. The recommendation system will return 3 courses of recommended courses.

Issue: Using a custom recommendation function that returns 3 values in a for loop that iterates through a transcript to compare already taken classes. The loop essentially finds/recommends the 3 classes that the student should take next. The issue is all the classes appear in one column cell and I have not found an easy way to break the column into separate rows.

Deeper dive:

I have a function (c_recommend) that returns 3 recommendations in the form of a series:

output: Series

INDEX Program Title
123 program 1
456 program 2
789 program 3

I then use this function(c_recommend) inside a for loop to iterate over the rows of a transcript to find the course title to compare to the catalog of classes.

## created empty list
results = list()

## run through the transcript 
for i in transcript.index:
## append to the list the name of the student, the course already taken, the recommended courses (3 will appear)
results.append([transcript['student'].loc[i],transcript['Course'].loc[i],c_recommend(transc['Course'].loc[i])])

output: List

Student Taken Class Recommended Classes
111 program 1 program 2, program 3, program 4
222 program 2 program 5, program 1, program 3
333 program 3 program 2, program 1, program 4

The recommended classes are all bunched into one row due to the fact that the c_recommend function runs and returns three values. I need a way to separate those 3 values out into their own columns like so:

desired output:

Student Taken Class Recommended Classes Reco Class 2 Reco Class 3
111 program 1 program 2 program 3 program 4
222 program 2 program 5 program 1 program 3
333 program 3 program 2 program 1 program 4

I have tried converting the list to a pandas dataframe and separating, using regex to split the commas, using nested loops. Alas, I have failed and the columns does not separate :( Ideally after this issue is fixed, I would like to convert this to a pandas DF. Maybe there is an easier way to handle this with pandas?

I would appreciate all and any insight even if that means rewriting my function.

TIA!

1 Answers1

0

The best would probably be to output directly a Series (i.e, several columns) in your initial transformation.

As you did not provided information on this, here is a way to rework your first output:

(df.drop('Recommended Classes', axis=1)
   .join(df['Recommended Classes']
           .str.split(', ', expand=True)  # split list of recommended classes
           .rename(columns=lambda x: x+1) # increment column name
           .add_prefix('Reco Class ')     # add prefix to column name
        )
)

output:

   Student  Taken Class  Reco Class 1 Reco Class 2 Reco Class 3
0       111   program 1     program 2    program 3    program 4
1       222   program 2     program 5    program 1    program 3
2       333   program 3     program 2    program 1    program 4
mozway
  • 194,879
  • 13
  • 39
  • 75