1

HI I need some assistance, so I have a file that has the following information;

IP,Ports,count
"192.168.0.1","80 8980 6789 443 4778 3556 7778 4432 5674 7786 2234 6678 33245 7788 3332 6678 3322 5432 5567",19
"192.168.0.2","80 8980 6789 443 4778 3556 7778 4432 5674 7786 2234 6678 33245 7788 3332 6678 3322 5432 5567",19
"192.168.0.3","80 8980 6789 443 4778 3556 7778 4432 5674 7786 2234 6678 33245 7788 3332 6678 3322 5432 5567",19
"192.168.0.4","80 8980 6789 443 4778 3556 7778 4432 5674 7786 2234 6678 33245 7788 3332 6678 3322 5432 5567",19

I want to split the ports into a range of like 5, for each file in a new file with its IP.

Expected results.

IP,Ports
192.168.0.1 80,8980,6789,443,4778
192.168.0.1 3556,7778,4432,5674,7786
192.168.0.1 2234,6678,33245,7788,3332
192.168.0.1 6678,3322,5432,5067
192.168.0.2 80,8980,6789,443,4778
192.168.0.2 3556,7778,4432,5674,7786
192.168.0.2 2234,6678,33245,7788,3332
192.168.0.2 6678,3322,5432,5067
192.168.0.3 80,8980,6789,443,4778
192.168.0.3 3556,7778,4432,5674,7786
192.168.0.3 2234,6678,33245,7788,3332
192.168.0.3 6678,3322,5432,5067
192.168.0.4 80,8980,6789,443,4778
192.168.0.4 3556,7778,4432,5674,7786
192.168.0.4 2234,6678,33245,7788,3332
192.168.0.4 6678,3322,5432,5067

To be honest, I have no idea how to do this or where to start. Kindly assist.

Either in AWK or python any can do, just explain to me what the script/one-liner does so that I can try and play around with it.

n00b
  • 37
  • 1
  • 1
  • 12
  • 1
    you can use `np.array_split` to split your array into sublist of equal length https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_split.html – Rajat Mishra Apr 22 '20 at 11:58

2 Answers2

2

For Python, you could do the following:

Demo:

from csv import DictReader, DictWriter

# Given attribute to https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks/312464#312464
def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

# Open both input and output files
with open("data.csv") as f, open("output.csv", mode="w", newline='') as o:

    # Create reading and writing objects
    reader = DictReader(f)
    writer = DictWriter(o, fieldnames=["IP", "Ports"])

    # Write headers
    writer.writeheader()

    # Go through each line from reader object
    for line in reader:

        # Split ports by whitespace into a list of ports
        ports = line["Ports"].split()

        # Go through each chunk(n = 5) of ports
        for port_chunk in chunks(ports, 5):

            # Write row to output CSV file
            row_dict = {"IP": line["IP"], "Ports": ",".join(port_chunk)}
            writer.writerow(row_dict)

output.csv

IP,Ports
192.168.0.1,"80,8980,6789,443,4778"
192.168.0.1,"3556,7778,4432,5674,7786"
192.168.0.1,"2234,6678,33245,7788,3332"
192.168.0.1,"6678,3322,5432,5567"
192.168.0.2,"80,8980,6789,443,4778"
192.168.0.2,"3556,7778,4432,5674,7786"
192.168.0.2,"2234,6678,33245,7788,3332"
192.168.0.2,"6678,3322,5432,5567"
192.168.0.3,"80,8980,6789,443,4778"
192.168.0.3,"3556,7778,4432,5674,7786"
192.168.0.3,"2234,6678,33245,7788,3332"
192.168.0.3,"6678,3322,5432,5567"
192.168.0.4,"80,8980,6789,443,4778"
192.168.0.4,"3556,7778,4432,5674,7786"
192.168.0.4,"2234,6678,33245,7788,3332"
192.168.0.4,"6678,3322,5432,5567"
RoadRunner
  • 25,803
  • 6
  • 42
  • 75
  • Getting some errors with the python script, but I will look at it thank you for your help though!! – n00b Apr 22 '20 at 12:53
  • @n00b Oh really? It works fine for me. What kind of errors are you getting? If your using windows you'll need to pass the full path to the files. – RoadRunner Apr 22 '20 at 13:00
2

Could you please try following(tested and written in shown samples).

awk -F'"|","' -v lines=$(wc -l < Input_file) '
BEGIN{
  print "IP,ports"
}
FNR>1{
  num=split($3,array," ")
  for(i=1;i<=num;i++){
    if(i==1){ printf $2 OFS }
    printf("%s%s",array[i],i%5==0||i==num?ORS:FNR==lines && i==num?ORS:",")
    if(i%5==0){ printf $2 OFS }
  }
}' Input_file

Explanation: Adding detailed explanation of above here.

awk -F'"|","' -v lines=$(wc -l < Input_file) '                                  ##Starting awk program from here.
BEGIN{                                                                          ##Starting BEGIN section of this program.
  print "IP,ports"                                                              ##Printing headers here.
}
FNR>1{                                                                          ##Checking condition if current line number is greater than 1st line.
  num=split($3,array," ")                                                       ##Splitting 3rd field into an array with delimiter space.
  for(i=1;i<=num;i++){                                                          ##Traversing through all elements of array here.
    if(i==1){ printf $2 OFS }                                                   ##if its first element of array then print 2nd field of line and OFS.
    printf("%s%s",array[i],i%5==0||i==num?ORS:FNR==lines && i==num?ORS:",")     ##Printing array value along with condition if its 5 element or number of total elements equals i then print new line OR current line number equal to lines OR i equals to num then print new line OR print comma.
    if(i%5==0){ printf $2 OFS }                                                 ##If its 5th element then print current line 2nd field with space
  }
}' Input_file                                                                   ##mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • This works really well could you kindly explain it to me, also do you have a resource where I can go learn AWK. :) – n00b Apr 22 '20 at 12:46
  • @n00b, I have just added detailed level explanation to my answer, for learning `awk` I would say you could see this link once https://stackoverflow.com/tags/awk/info where I too refer could be helpful to you. – RavinderSingh13 Apr 22 '20 at 12:48
  • @n00b, any queries related to explanation, kindly do lemme know will try my best to help, cheers. – RavinderSingh13 Apr 22 '20 at 12:49