why the excel file comparison not working with pandas?

Question

I am working on the below automation requirements:

compare two excel files, and based on the logic create 3rd csv file with the final comparison.
My both excel files have same col names "Name", and "Size".

a) For each weekly release an excel file is generated, and the file contains "name" and "Size" fields.

b) I need to compare the previous weekly release and current weekly release excel files.

c) The 3rd CSV file that is created will still contains the current weekly release "size" value. However, if the previous weekly release's "size" and current weekly release "size" differ, then I just need to give them a color code to those two cells.

Based on the above requirements, I have written below program.

import pandas as pd
import re

A = pd.read_excel('abc.xlsx')
B = pd.read_excel('zyx.xlsx')


Name_A = A['Name'].tolist()
Size_A = A['Size'].tolist()

Name_B = B['Name'].tolist()
Size_B = B['Size'].tolist()

Ref_dict = dict(zip(Name_A, Size_A))
Unknown_dict = dict(zip(Name_B, Size_B))


filename = 'seq_match_compare.csv'
f = open(filename, 'w')
headers = 'Name, Size\n'
f.write(headers)

for ID, seq in Unknown_dict.items():
    for species, seq1 in Ref_dict.items():
        m = re.search(ID, seq1)
        if m:
            match = m.group()
            pos = m.start() + 1
            f.write(str(ID) + ',' + ID + ',' + species + ',' + seq1 + ',' + match + ',' + str(pos) + '\n')

f.close()

But I am getting below errors:

Traceback (most recent call last):
  File "<path_to>/xl_diff.py", line 35, in <module>
    m = re.search(ID, seq1)
  File "<path_to>\AppData\Local\Programs\Python\Python37\lib\re.py", line 183, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

Note:

The 3rd CSV file will already have previous weekly release data. I need to print the current weekly release data(Size) in new empty column. sample screenshot attached.

Not sure, if my method is correct. Please suggest.

You can replace all this code with `merge`, to perform a "left-join" between the new dataframe and the old and retain only the rows found in the new dataframe — Panagiotis Kanavos, Feb 07 '22 at 08:05
@PanagiotisKanavos, The other link you shared will not work for me, as I still need to print the 'size' even if the value hasn't changed, and if the value changed then just give the color to the corresponding cells. — user3521180, Feb 07 '22 at 08:14
`merge` works. You need to explain your business logic though. `merge` *adds* columns that show the differences. Once that's done you can select which columns to include or exclude. If you only need to find differences by name, just use `'Name'` as the key — Panagiotis Kanavos, Feb 07 '22 at 08:16
besides, a CSV is a text file, it has no color. It means `comma separated values`. It's just a text file with values separated by commas, eg `A,B,C,1,234.5`. To display data with colors you need a format that supports styling and colors, eg Excel or HTML — Panagiotis Kanavos, Feb 07 '22 at 08:30
You should also check [How to save pandas to excel with different colors](https://stackoverflow.com/questions/54109548/how-to-save-pandas-to-excel-with-different-colors). The answer is - you can't do that with pandas but you can open the Excel file afterwards and add a conditional format formula that highlights the changed cells, eg based on the `_merge` column. — Panagiotis Kanavos, Feb 07 '22 at 08:35
It looks like you want to append *columns* to the diff file though. In that case you'll probably have to edit the Excel file explicitly, using the diff rows produced by `merge`. — Panagiotis Kanavos, Feb 07 '22 at 08:37
This thread could be concluded now. The "merge" functions works. — user3521180, Feb 08 '22 at 13:53

why the excel file comparison not working with pandas?

0 Answers0