-1

I have 2 csv files like:

format

  • REPORT_NUM,EXEC,REPORT_NAME,REPORT_COUNT

before.csv

  • 1,1,"Report 1",45
  • 2,1,"Report 2",456
  • 3,1,"Report 3",11
  • 4,1,"Report 4",0

after.csv

  • 1,1,"Report 1",47
  • 2,1,"Report 2",456556
  • 3,1,"Report 3",0
  • 4,1,"Report 4",212

I basically need for each REPORT_NUM to compare REPORT_COUNT and then output a 3rd csv with REPORT_NAME,before REPORT_COUNT, after REPORT_COUNT when there's a threshold cross ( when the after is more than 10% different to before ) . EXEC is just an execution run.

So result.csv might show:

  • 2,1,"Report 2",456,456556
  • 3,1,"Report 3",11,0
  • 4,1,"Report 4",0,212

I am looking at the following for inspiration:

Comparing values between 2 CSV files and writing to a 3rd CSV file

Python: Comparing two CSV files and searching for similar items

I continue to search ,but any feedback appreciated.

Thank you in advance!

p.s. I am assuming Python is best , I dont mind what language but I have basic python understand. I started writing this in bash and using "diff" and "sed" .. and so I may go that route..

Gripsiden
  • 467
  • 2
  • 15

1 Answers1

1

Based on the 2 links you gave:

import csv

with open('before.csv', 'r') as before:
    before_indices = dict((i[2], i[3]) for i in csv.reader(before))


with open('after.csv', 'r') as reportAfter:
    with open('results.csv', 'w') as results:
        reader = csv.reader(reportAfter)
        writer = csv.writer(results, quoting=csv.QUOTE_NONNUMERIC)

        for row in reader:
            value = before_indices.get(row[2])

            if float(row[3]) > 1.1*float(value) or float(row[3]) < 0.9*float(value):
                writer.writerow([int(row[0]),int(row[1]),row[2],int(value),int(row[3])])

this produces your desired output given your example input on linux. On windows you need to change according to this Python3: writing csv files. If you have non-integer numbers you may want to change the int() in the last line to float() to keep decimals.

voiDnyx
  • 975
  • 1
  • 11
  • 24
  • This is great :) Just a quick question - if i need to adjust the % tolerance , say to 5% I change the 1.1 and 0.9 values? – Gripsiden Oct 23 '17 at 13:58
  • 1
    Yes thats correct, also keep in mind, comparing very small or very large numbers needs to be done carefully cause of floating point precision. – voiDnyx Oct 23 '17 at 14:00
  • OOps sorry , i thought i could adjust ( for 5% , to 0.6 > and 0.4 ) but doesnt appear to be the case.. sorry for being a dummy on this one. Can you give any guidance on how that is used? – Gripsiden Oct 23 '17 at 14:09
  • 1
    for 5% difference you need to adjust to larger than 5% more `> 1.05` or smaller than 5% less `< 0.95`, basically `1.0 -5% or +5%` – voiDnyx Oct 23 '17 at 14:11
  • Thank you sir , have a great day! – Gripsiden Oct 23 '17 at 14:13