1

I have a pandas dataframe like

user_id music_id has_rating
A a 1
B b 1

and I would like to automatically add new rows for each of user_id & music_id for those users haven't rated, like

user_id music_id has_rating
A a 1
A b 0
B a 0
B b 1

for each of user_id and music_id combination pairs those are not existing in my Pandas dataframe yet.

is there any way to append such rows automatically like this?

DS Park
  • 159
  • 1
  • 11
  • 1
    To make sure I understand the question, you'd want to go and find all combinations of `user_id` and `music_id`, and for those that don't exist, add a new line with `has_rating` = 0? – Zach J. Nov 03 '22 at 21:48
  • https://stackoverflow.com/questions/15888648/is-it-possible-to-insert-a-row-at-an-arbitrary-position-in-a-dataframe-using-pan – raja Nov 03 '22 at 21:49
  • Hi Zach, yes - if the combination pair of user_id and music_id already exists, no need to add, but if it doesn't exist, I would like to add a new line with has_rating = 0. – DS Park Nov 03 '22 at 21:52

2 Answers2

1

You can use a temporary reshape with pivot_table and fill_value=0 to fill the missing values with 0:

(df.pivot_table(index='user_id', columns='music_id',
                values='has_rating', fill_value=0)
   .stack().reset_index(name='has_rating')
 )

Output:

  user_id music_id  has_rating
0       A        a           1
1       A        b           0
2       B        a           0
3       B        b           1
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Try using pd.MultiIndex.from_product()

l = ['user_id','music_id']

(df.set_index(l)
.reindex(pd.MultiIndex.from_product([df[l[0]].unique(),df[l[1]].unique()],names = l),fill_value=0)
.reset_index())

Output:

  user_id music_id  has_rating
0       A        a           1
1       A        b           0
2       B        a           0
3       B        b           1
rhug123
  • 7,893
  • 1
  • 9
  • 24