DataFrame Operation very ineficient with loop, don't know how to fix it

Question

I have this loop where i take questions from a DataFrame df_labels that looks like that for session 21100511290882536:

"session_id"      "question" "correct"
21100511290882536   1   1
21100511290882536   2   1
21100511290882536   3   1
21100511290882536   4   1
21100511290882536   5   0
21100511290882536   6   1
21100511290882536   7   1
21100511290882536   8   0
21100511290882536   9   1
21100511290882536   10  1
21100511290882536   11  1
21100511290882536   12  1
21100511290882536   13  0
21100511290882536   14  1
21100511290882536   15  1
21100511290882536   16  1
21100511290882536   17  1
21100511290882536   18  1

and i would like to convert it and all other sessions to a dataframe like this:

"session_id"       "q_1"   "q_2"  "q_3"   "q_4" ...
21100511290882536    1        1     1       1   ...

knowing i already have a DataFrame "df_sessions" listing all the sessions

for session in df_sessions.session_id:
    for i in range(1,19):
        df_sessions[f'q_{i}'][df_sessions['session_id'] == session] = df_labels.correct[(df_labels['session_id'] == session) & (df_labels['question'] == i)]

This code works but is very ineficient, it takes 20 minutes for such an operation and i might need to do more operation like this thus leading to very poor code efficiency and computing time. Thx in advance for your help !

This should be a oneliner: have a look at [`pivot`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html) or [`unstack`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.unstack.html) — Harry Haller, Feb 16 '23 at 09:12

score 2 · Answer 1 · answered Feb 16 '23 at 09:14

Try:

>>> (df.assign(question='q_' + df['question'].astype(str).str.zfill(2))
       .pivot(index='session_id', columns='question', values='correct')
       .rename_axis(columns=None).reset_index())
       
          session_id  q_01  q_02  q_03  q_04  q_05  q_06  q_07  q_08  q_09  q_10  q_11  q_12  q_13  q_14  q_15  q_16  q_17  q_18
0  21100511290882536     1     1     1     1     0     1     1     0     1     1     1     1     0     1     1     1     1     1

DataFrame Operation very ineficient with loop, don't know how to fix it

1 Answers1