0

I want to count the number of duplicated IP address in my column and list them out. What i have so far is:

authorized = df_log[df_log['Access Type'] == 'Authorized']
authorized = authorized.groupby('host/IP address')\
.size().reset_index(name='No. of times Duplicated')

The problem with this is that it displays the counts of all the IP addresses even those that appear just once for example:

enter image description here

So I'm trying to filter out where size > 1 I guess. Also, the number of times duplicated should be 1 less for each number. The output i want is similar to the picture except number 21 is gone and all the numbers are reduced by 1

Aaron
  • 165
  • 8

2 Answers2

2

Why not use value_counts() and duplicated():

ips = authorized['host/IP address']
# select duplicated IPS
duplicated_ips = ips[ips.duplicated()]
# count them
counts_duplicated_ips = duplicated_ips.value_counts()

P.S. Thanks @tidakdiinginkan in the comments.

0

Try this:

df['dups'] = df['IP'].duplicated()
print(df[df['dups']==True].groupby(by=['Access']).count())
NYC Coder
  • 7,424
  • 2
  • 11
  • 24