I am working on the below automation requirements:
- compare two excel files, and based on the logic create 3rd csv file with the final comparison.
- My both excel files have same col names "Name", and "Size".
a) For each weekly release an excel file is generated, and the file contains "name" and "Size" fields.
b) I need to compare the previous weekly release and current weekly release excel files.
c) The 3rd CSV file that is created will still contains the current weekly release "size" value. However, if the previous weekly release's "size" and current weekly release "size" differ, then I just need to give them a color code to those two cells.
Based on the above requirements, I have written below program.
import pandas as pd
import re
A = pd.read_excel('abc.xlsx')
B = pd.read_excel('zyx.xlsx')
Name_A = A['Name'].tolist()
Size_A = A['Size'].tolist()
Name_B = B['Name'].tolist()
Size_B = B['Size'].tolist()
Ref_dict = dict(zip(Name_A, Size_A))
Unknown_dict = dict(zip(Name_B, Size_B))
filename = 'seq_match_compare.csv'
f = open(filename, 'w')
headers = 'Name, Size\n'
f.write(headers)
for ID, seq in Unknown_dict.items():
for species, seq1 in Ref_dict.items():
m = re.search(ID, seq1)
if m:
match = m.group()
pos = m.start() + 1
f.write(str(ID) + ',' + ID + ',' + species + ',' + seq1 + ',' + match + ',' + str(pos) + '\n')
f.close()
But I am getting below errors:
Traceback (most recent call last):
File "<path_to>/xl_diff.py", line 35, in <module>
m = re.search(ID, seq1)
File "<path_to>\AppData\Local\Programs\Python\Python37\lib\re.py", line 183, in search
return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object
Note:
- The 3rd CSV file will already have previous weekly release data. I need to print the current weekly release data(Size) in new empty column. sample screenshot attached.
Not sure, if my method is correct. Please suggest.