0

I would like to add a repeat count for the duplicate rows. current example only drop duplicate rows.

import re
import pandas as pd

data = [
    ['A','B','C'],
    ['A','B','C'],
    ['A','D','C'],
    ['A','D','C']
]

df = pd.DataFrame(data,columns=['x1','x2','x3'])
print(df)

df1 = df.drop_duplicates(keep='first')
print(df1)

expected output:

:   x1 x2 x3
: 0  A  B  C
: 1  A  B  C
: 2  A  D  C
: 3  A  D  C
:   x1 x2 x3 count
: 0  A  B  C 2
: 2  A  D  C 2
lucky1928
  • 8,708
  • 10
  • 43
  • 92

2 Answers2

1

Try this line:

df1 = df.groupby(['x1', 'x2', 'x3']).size().reset_index(name='count')

Here, we first group_by x1, x2, and x3, so there will be no duplicates and then count how many duplicates are grouped.

Ehsan Hamzei
  • 339
  • 2
  • 8
1

You can use the pandas function duplicated, then groupby, count and add 1 (to count the first appearance), then some formatting to match your expectations:

data = [
    ['A','B','C'],
    ['A','B','C'],
    ['A','D','C'],
    ['A','D','C']
]

df = pd.DataFrame(data,columns=['x1','x2','x3'])
print(df)
df['dup'] = df.duplicated()
df = df[df.dup]
df = (df.drop(columns=['dup']).groupby(['x1', 'x2', 'x3']).value_counts() + 1).reset_index(name='count')

print(df)

Output:

  x1 x2 x3
0  A  B  C
1  A  B  C
2  A  D  C
3  A  D  C
  x1 x2 x3  count
0  A  B  C      2
1  A  D  C      2
100tifiko
  • 361
  • 1
  • 10