3

I've got a complex melt function that I need to do. I've tried stepping through it and checking other questions, but I am very stuck!

Here is my current dataframe:

         1   2   4   5   6   10   24
Userid

u_A      0   1   3   0   0    0    5
u_B      0   0   0   0   0    0    1

and I need to get to this:

Userid    Movieid    Rating
u_A          2         1
u_A          4         3
u_A          24        5
u_B          24        1

So, to clarify, removing 0 valued ratings and having each rating on a separate row, accessible by the Userid. I know that usually the columns axis also has a name (in this case, that series at the top), but I've somehow managed to lose that in my processing.

The closest I got was with the statement fold_1_df.melt(var_name=' movie_id', value_name=' rating') but the format is still not complete

Alex K
  • 129
  • 6

3 Answers3

1

You can .reset_index(), melt and set index again. Then filter the dataframe by rating:

x = (
    df.reset_index()
    .melt(id_vars="Userid", var_name="movie_id", value_name="rating")
    .set_index("Userid")
)
print(x[x["rating"] > 0])

Prints:

       movie_id  rating
Userid                 
u_A           2       1
u_A           4       3
u_A          24       5
u_B          24       1
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

I've found a workaround at this point which looks like this:

fold_1_df = fold_1_df.melt(ignore_index = False,var_name=' movie_id', value_name=' rating') 
fold_1_df[fold_1_df[' rating'] != 0]
Alex K
  • 129
  • 6
0

You can replace 0 with NaN and then use df.stack()

df.rename_axis("movie_id", axis=1).replace(0, np.nan).stack().reset_index(
    name="rating"
)

  Userid movie_id  rating
0    u_A        2     1.0
1    u_A        4     3.0
2    u_A       24     5.0
3    u_B       24     1.0
Ch3steR
  • 20,090
  • 4
  • 28
  • 58