Count the number of duplicates in a column and list them out

Question

I want to count the number of duplicated IP address in my column and list them out. What i have so far is:

authorized = df_log[df_log['Access Type'] == 'Authorized']
authorized = authorized.groupby('host/IP address')\
.size().reset_index(name='No. of times Duplicated')

The problem with this is that it displays the counts of all the IP addresses even those that appear just once for example:

So I'm trying to filter out where size > 1 I guess. Also, the number of times duplicated should be 1 less for each number. The output i want is similar to the picture except number 21 is gone and all the numbers are reduced by 1

You filter using `authorized = authorized-1` considering it is a series and then `authorized = authorized.loc[authorized.ne(0), :]` — Space Impact, May 09 '20 at 18:57
The output authorized is reduced by 1 using the first highlighted code and then we are filtering the values that are not equal to 0. — Space Impact, May 09 '20 at 19:02
Is this question can help you: https://stackoverflow.com/q/14657241/12744275 — Renaud, May 09 '20 at 19:17

Rostyslav Shevchenko · Accepted Answer · 2020-05-09T19:22:37.397

2

Why not use value_counts() and duplicated():

ips = authorized['host/IP address']
# select duplicated IPS
duplicated_ips = ips[ips.duplicated()]
# count them
counts_duplicated_ips = duplicated_ips.value_counts()

P.S. Thanks @tidakdiinginkan in the comments.

edited May 09 '20 at 19:22

answered May 09 '20 at 19:15

Rostyslav Shevchenko

201
2
6

1

I'd say this - [pandas.DataFrame.duplicated](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.duplicated.html) would be better. – tidakdiinginkan May 09 '20 at 19:17
@tidakdiinginkan right, but he needs the counts, so probably the best is to combine. – Rostyslav Shevchenko May 09 '20 at 19:19

score 0 · Answer 2 · answered May 09 '20 at 19:20

0

Try this:

df['dups'] = df['IP'].duplicated()
print(df[df['dups']==True].groupby(by=['Access']).count())

answered May 09 '20 at 19:20

NYC Coder

7,424
2
11
24

Count the number of duplicates in a column and list them out

2 Answers2