2

I have two csv files and i am using one csv to search all records from another csv and update its status. I have two tables and looking for IP and PROTOCOL from search table in input.csv. If it is there, then the EXISTS column is updated to 'No'. I am stuck where both the protocol and ip are same for two or more records, but they have different port. It is updating only one record.

import csv

IP, EXISTS, PROTOCOL = 'IP', 'Exists', 'Protocol'  # Field names referenced.

# Read entire input file into a list.
with open('input.csv', 'r', newline='') as inp:
    reader = csv.DictReader(inp)
    inputs = list(reader)

# Update input rows that match data in search.csv file.
with open('search.csv', 'r', newline='') as sea:
    sea_reader = csv.DictReader(sea)
    for row in sea_reader:
        protocol, ip = row[PROTOCOL], row[IP]
        for input_ in inputs:
            if input_[PROTOCOL] == protocol and input_[IP] == ip:  # Match?
                input_[EXISTS] = 'No'
                break

# Write updated input.csv data out into a file.
with open('input_updated.csv', 'w', newline='') as outp:
    fieldnames = inputs[0]
    writer = csv.DictWriter(outp, fieldnames)
    writer.writeheader()
    for input_ in inputs:
        writer.writerow(input_)

print('done')

Input.csv

Name IP Protocol Port Exists
l1 192.132.16.02 HTTP 80
l2 192.132.16.03 SSL 8443
l3 192.132.16.03 SSL 443
l4 192.132.16.04 SSL 443

search.csv

No Protocol IP Port
1 HTTP 192.132.16.02 80
2 SSL 192.132.16.03 443
3 SSL 192.132.16.03 8443

Here in search.csv there are two records which has same protocol and ip but different port. In the result it is just including one record and not both. I tried adding port in the condition too but it does not work

Fatilearns
  • 33
  • 4

2 Answers2

0

You can use pandas to read the csv files. Create indexes on both IP, Protocol for both files. Iterate the search_df and insert value into the Exists column as True if index of search_df is found in input_df. Reset the index and drop nan values.


import pandas as pd
input_df = pd.read_csv("input.csv")
search_df = pd.read_csv("search.csv")
input_df = input_df.set_index(["IP","Protocol"])
search_df = search_df.set_index(["IP","Protocol"])
for index, _ in search_df.iterrows():
    input_df.loc[index,"Exists"] = True
input_df = input_df.reset_index()
input_df = input_df.dropna()

theunknownSAI
  • 300
  • 1
  • 4
0

The only issue in your code is that you break out of the loop as soon as you find a match, so you will never find further matches. Removing the break will make it work, although it will be slower of course.

One thing is unclear to me: is the port number relevant? In your example the second input line would be matched even if the third search line didn't exist. So it would probably be a good idea to include the port number in the match test:

with open('search.csv', 'r', newline='') as sea:
    sea_reader = csv.DictReader(sea)
    for row in sea_reader:
        protocol, ip, port = row[PROTOCOL], row[IP], row[PORT]
        for input_ in inputs:
            if input_[PROTOCOL] == protocol and input_[IP] == ip and input_[PORT] == port:  # Match?
                input_[EXISTS] = 'No'
                break

In this case the break can remain, cause you will have only exact matches (i.e. input #2 with search #3 and input #3 with search #2)

gimix
  • 3,431
  • 2
  • 5
  • 21