0

I need help rearranging a data frame using pandas. Currently it is disorganized and I would like to organize it better. This is my current DataFrame:

    username question answer  pre    post
0      a      Q1       abc    2.2    3.3
1      a      Q2       bcd    2.2    3.3
2      a      Q3       deg    2.2    3.3
3      b      Q1       cde    3.1    3.5   
4      b      Q2       cee    3.1    3.5  

There are 5000+ rows, 21 unique questions, 350 unique usernames. All answers are different, but the pre and post floats are the same for the same user (regardless of question/answer).

I would like help to change the data frame to this:

    username    Q1    Q2    Q3  (continued to Q21) pre  post
0     a        abc   bcd   deg    ......           2.2  3.3
1     b        cde   cee   ...    ......           3.1  3.5
2     c  ....
3     d  ....


jchan
  • 1
  • `df.pivot(index='username', columns= 'question', values='answer').merge(df, how='left', on='username').drop_duplicates(subset=['username', 'pre', 'post']).drop(['answer', 'question'], axis=1)` – David Erickson Jun 30 '20 at 23:26

2 Answers2

0

You can pivot on the question column:

x[['username','question','answer']].set_index('username').pivot(columns = 'question').droplevel(0, axis = 1)

Can merge in the other information with this:

x[['username','question','answer']].set_index('username').pivot(columns = 'question').droplevel(0, axis = 1).merge(x.set_index('username').drop(['question','answer'], axis = 1).drop_duplicates(), left_index = True, right_index = True).reset_index()



 username   Q1   Q2   Q3  pre  post
0        a  abc  bcd  deg  2.2   3.3
1        b  cde  cee  NaN  3.1   3.5
Adam Zeldin
  • 898
  • 4
  • 6
0

It should work if you use pivot_table() ,setting your username, pre and post as index, since you said they are the same:

df = pd.DataFrame({'username':np.repeat(['a','b','c'],3),
              'question':['Q1','Q2','Q3']*3,
              'answer':np.random.choice(['x','y','z'],9),
              'pre':np.repeat([1,2,3],3),'post':np.repeat([1.5,2.5,3.5],3)})

      username  question answer pre post
    0   a   Q1  x   1   1.5
    1   a   Q2  y   1   1.5
    2   a   Q3  x   1   1.5
    3   b   Q1  y   2   2.5
    4   b   Q2  z   2   2.5
    5   b   Q3  x   2   2.5
    6   c   Q1  x   3   3.5
    7   c   Q2  y   3   3.5
    8   c   Q3  z   3   3.5

df.pivot_table(index=['username','pre','post'],values='answer',columns='question',aggfunc=np.unique).reset_index()

  question  username    pre post    Q1  Q2  Q3
0   a   1   1.5 x   y   x
1   b   2   2.5 y   z   x
2   c   3   3.5 x   y   z
StupidWolf
  • 45,075
  • 17
  • 40
  • 72