-1

Example df:

df = pd.DataFrame({
    'id': ['1', '1', '2', '2', '2', '2', '3', '3', '3', '3', '3', '3'],
    'dialog': ['answer1', 'answer2', 'answer1', 'answer2', 'answer3', 'answer4', 'answer1', 'answer2', 'answer3', 'answer4', 'answer5', 'answer6']
})

I want to group it by id and then transform each pair of answers to row (number of answers in group is always even-numbered) like this and have no idea how to do it:

id phrase1 phrase2
1  answer1 answer2
2  answer1 answer2
2  answer3 answer4
3  answer1 answer2
3  answer3 answer4
3  answer5 answer6
Contra111
  • 325
  • 2
  • 10

2 Answers2

3

You can try:

(df.set_index(['id', df.index // 2, (df.index % 2) + 1])['dialog']
   .unstack()
   .add_prefix('phrase')
   .reset_index(level=1, drop=True))

Output:

    phrase1  phrase2
id                  
1   answer1  answer2
2   answer1  answer2
2   answer3  answer4
3   answer1  answer2
3   answer3  answer4
3   answer5  answer6
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
2

Since its always even numbered, you can simply concat them by slicing:

df = df.set_index("id")

print (pd.concat([df.iloc[::2],df.iloc[1::2]],ignore_index=True, axis=1)
         .rename(columns={0:"phrase1",1:"phrase2"}))

    phrase1  phrase2
id                  
1   answer1  answer2
2   answer1  answer2
2   answer3  answer4
3   answer1  answer2
3   answer3  answer4
3   answer5  answer6

For uneven df:

s = df.groupby(["id", df.index//2], as_index=False).agg(list)

print (pd.concat([s, pd.DataFrame(s["dialog"].tolist())], axis=1).drop("dialog", 1))

  id        0        1
0  1  answer1  answer2
1  2  answer1  answer2
2  2  answer3  answer4
3  3  answer1  answer2
4  3  answer3  answer4
5  3  answer5  answer6
6  3  answer7     None
Henry Yik
  • 22,275
  • 4
  • 18
  • 40
  • Looks fine! But is there a way to do it with group by or any solution in case with not even number? – Contra111 Sep 04 '20 at 14:25
  • Why i got this 'InvalidIndexError: Reindexing only valid with uniquely valued Index objects'? I usr same shape df but with a different data – Contra111 Sep 04 '20 at 14:34
  • How about `df = df.set_index("id")['dialog']` and `pd.DataFrame({'phrase1': df[::2], 'phrase2': df[1::2]})` – Mark Wang Sep 04 '20 at 14:40