2

I'm still new to Python and I'm trying to adapt this code to work for me from this post.

The difference between that post and what I'm looking for is that I am looking to concatenate the entire contents of the matching rows from both hosts.csv and masterlist.csv when a matching 'signature' is found in both files.

So if hosts.csv looked like this:

Path    Filename    Size    Signature
C:\     a.txt       14kb    012345
D:\     b.txt       99kb    678910
C:\     c.txt       44kb    111213

And masterlist.csv looked like this:

Signature    Name    State
012345       Joe     CT
567890       Sue     MA
111222       Dan     MD

Tinkering with the code posted by Martijn Pieters in his response to Serk's post, his code gets me most of the way there.

import time, csv
timestr = time.strftime("%Y%m%d_%H%M")
outputfile = "Results_" + (timestr) + ".csv"

    with open('masterlist.csv', 'rb') as master:
        master_indices = dict((r[0], i) for i, r in enumerate(csv.reader(master)))

    with open('hosts.csv', 'rb') as hosts:
        with open('results.csv', 'wb') as results:    
            reader = csv.reader(hosts)
            writer = csv.writer(results)

            writer.writerow(next(reader, []) + ['RESULTS'])

            for row in reader:
                index = master_indices.get(row[3])
                if index is not None:
                    message = 'FOUND in (row {})'.format(index)
                else:
                    message = 'NOT FOUND'
                writer.writerow(row + [message])

Instead of just adding the RESULTS column as Serk was looking for that indicates the matching signature, how can I pull in the corresponding rows from the masterlist.csv and hosts.csv files and concatenate the two together in the results.csv file? The desired output file would look like this:

Path    Filename    Size    RESULTS          Signature    Name  State    
C:\     a.txt       14kb    FOUND in Row 1   012345       Joe   CT
D:\     b.txt       99kb    FOUND in Row 2   678910       Sue   MA
C:\     c.txt       44kb    NOT FOUND        111213

Thanks in advance, responses on here have already help me out with most of the solutions I have been looking for!

Community
  • 1
  • 1
Tommyboy
  • 21
  • 2
  • Do you have to use Python? The `join` shell command does this already. (Also, the name of the operation you're doing is "join." It's really common in relational databases.) – David Ehrmann Aug 09 '16 at 22:27

1 Answers1

3

Use pandas.read_csv and merge on "Signature" columns

import pandas as pd

hosts_df = pd.read_csv("hosts.csv ")
masterlist_df = pd.read_csv("masterlist.csv")
results = masterlist_df.merge(hosts_df, on="Signature", how="outer")
results.to_csv("results.csv")
rwester
  • 154
  • 9