0

I have consecutive row duplicates in two column. I want to delete the second row duplicate based on [col1,col2] and move the value of another column to a new one.

Example:

Input

col1 col2 col3
  X    A   1
  X    A   2
  Y    A   3
  Y    A   4
  X    B   5
  X    B   6
  Z    C   7
  Z    C   8

Output

col1 col2 col3 col4
 X    A    1    2
 Y    A    3    4
 X    B    5    6
 Z    C    7    8

I found out about pivoting but I am struggling to understand how to add another column and avoid indexing, I would to preserve everything as written in the example

Tway101
  • 39
  • 2
  • 5

1 Answers1

0

This is similar to Question 10 here:

(df.assign(col=df.groupby(['col1','col2']).cumcount())
   .pivot_table(index=['col1','col2'], columns='col', values='col3')
   .reset_index()
)

Output:

col col1 col2  0  1
0      X    A  1  2
1      X    B  5  6
2      Y    A  3  4
3      Z    C  7  8
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74