0

I have an excel spreadsheet I create a data frame from. When I run my code I can't get the dataframe to sort correctly on port_count. I'm trying to make it sort on port_count and then display the port that are open for the ip address. This code is almost there, but sorting is giving me a problem.

import pandas as pd
import openpyxl as xl;
data = {'IP':  ['192.168.1.1','192.168.1.1','192.168.1.1','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','10.10.10.10','192.168.5.3','192.168.5.3','192.168.4.6','192.168.4.6','192.168.4.7','192.168.4.7','192.168.8.9','192.168.8.9','10.10.2.3','10.10.2.3','10.5.2.3','10.5.2.3','10.1.2.3','10.1.2.3','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','1.2.3.4','4.5.6.7','4.5.6.7','4.5.6.7','4.5.6.7','4.5.6.7','192.168.9.10','192.168.9.10','192.168.9.11','192.168.9.11','192.168.9.12','192.168.9.12','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','10.1.5.6','8.9.10.11','8.9.10.11','8.9.10.11','2.8.3.9','2.8.3.9','12.13.14.15','13.14.15.16','13.14.15.16','74.208.236.41','74.208.236.41','74.208.236.41','3.234.139.2','3.234.139.2','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','172.67.173.229','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','192.168.60.23','1.2.3.6','192.168.9.6','192.168.9.6','172.16.54.65','172.16.54.65','172.16.54.65','172.16.54.65','172.16.54.66','172.16.54.66','172.16.54.66','172.16.85.36','172.16.85.36','10.10.12.12','10.10.12.12'],
        'Port': ['22','80','443','80','443','2082','2083','2086','2087','2095','8080','8443','8880','80','443','80','443','80','443','80','443','80','443','80','443','80','443','21','22','25','80','110','143','443','465','587','993','995','2082','2086','2087','2096','3306','25','80','443','465','587','80','443','80','443','80','443','80','443','2052','2053','2082','2083','2086','2087','2096','8080','8443','8880','5222','8008','8443','80','443','80','80','443','80','81','443','80','443','80','443','2082','2083','2087','8443','8880','80','443','2052','2053','2082','2083','2086','2087','2096','8080','8443','8880','80','80','443','80','82','83','443','80','82','443','80','443','80','443'],
        }
df = pd.DataFrame(data)
df['port_count'] = df.groupby('IP')['Port'].transform('count')
df['port_count'] = df['port_count'].astype(int)
df.sort_values(by=['port_count'], ascending=False, inplace=True)
pivot1 = df.pivot_table(df, index=['IP', 'Port'], columns=None, fill_value=0).sort_values(by='port_count', ascending=False)
if df.size != 0:
    with pd.ExcelWriter("/testing/test.xlsx", mode="a", engine="openpyxl", if_sheet_exists='replace') as writer:
        pivot1.to_excel(writer,sheet_name="IP to Port")

Current output looks like this: https://www.hopticalillusion.co/shared-files/730/?test_output.xlsx

Desired Output: https://www.hopticalillusion.co/shared-files/731/?desired_test_output.xlsx

timlaw71
  • 27
  • 5
  • 2
    Please share a reproducible sample of your data set. – Anoushiravan R Sep 30 '22 at 19:36
  • what is the problem? it looks sorted fine – alec_djinn Sep 30 '22 at 19:37
  • It needs to sort on the port_count column. the 10.10.16.5 should be under the column with 10 – timlaw71 Sep 30 '22 at 19:39
  • @AnoushiravanR I added a link to the input file. – timlaw71 Sep 30 '22 at 19:58
  • Refrain from showing your dataframe as an image. Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Sep 30 '22 at 20:12
  • 2
    It looks to me that the port count column is being interpreted as a string. It also looks like you are trying to sort it numerically. – itprorh66 Sep 30 '22 at 20:14
  • 1
    @itprorh66 I though it was fixed, but sadly it still isn't grouping the ips to the ports correctly. I'm going to update the code on the post to what I have so far. – timlaw71 Sep 30 '22 at 20:36
  • 1
    **DO NOT** post links to your data - copy or type a sample of the text into the question so that we can copy and paste it into a test environment.. – itprorh66 Sep 30 '22 at 23:23
  • @itprorh66 I don't know how to get the format to look right, so I had to post links.....sorry – timlaw71 Sep 30 '22 at 23:34

1 Answers1

0

Maybe try the following:

df['port_count'] = df['port_count'].astype(int)
df.sort_values(by=['port_count'], ascending=False, inplace=True)
jp207
  • 94
  • 7