0

I'm trying to compare two excel files and then output the non equal values into a new excel file.

Currently in the excel3 file it will display (value in excel1) --> (value in excel2) but I also want to add a red background to the cell as well so it is easily visible.

I've tried looking around online and can't figure it out. I'm pretty new to python as well.

#Needed packages
import pandas as pd
import numpy as np

#Changes the col number into its corresponding excel col letter
def col_num(n):
    n = n + 1
    string = ""
    while n > 0:
        n, remainder = divmod(n - 1, 26)
        string = chr(65 + remainder) + string
    return string

#Puts the characters from the col_num method into a string (Could be improved)   
def char_array(cols):
    i = 0
    ex_cols = ""
    while i < len(cols):
        if i == len(cols) - 1:
            ex_cols += (col_num(cols[i]))
        else:
            ex_cols += (col_num(cols[i])) + " "
        i += 1
    return ex_cols

print("\nExcel Comparer v1.2\n")

#Retrieve excel files for comparison
while True:
    file = input("First Excel file for comparison: ")
    try:
        df1 = pd.read_excel(file + ".xlsx")
        break
    except FileNotFoundError:
        print("File not Found, please make sure this program is in the same directory as both excel files.")
while True:
    file = input("Second Excel file for comparison: ")
    try:
        df2 = pd.read_excel(file + ".xlsx")
        break
    except FileNotFoundError:
        print("File not Found, please make sure this program is in the same directory as both excel files.")
print("\n\nFiles compared succesfully!\n\n")

#determines whether the files are exactly equal
print("\nAre the Documents exactly the same:", df1.equals(df2))

#shows each cell as being either equal(True) or not equal(False)
values_compared = df1.values == df2.values
print("\nEach cell on whether or not they're equivalent:\n", values_compared)

#Get all cells where the values are not equal(False)
rows, cols = np.where(values_compared == False)
print("\nThe indexes of each non-equal value:")
print("Col: [", char_array(cols), "]")
print("Row: ", (rows + 2))

#df1 will now show the differences between the two files
for item in zip(rows, cols):
    df1.iloc[item[0], item[1]] = '{} --> {}'.format(df1.iloc[item[0], item[1]], df2.iloc[item[0], item[1]])

#Creates a new excel file and writes the differences shown
df1.to_excel('./excel3.xlsx', index = False, header = True)

print("\nexcel3.xlsx has been written to this directory with the discrepancies.")
Toto
  • 89,455
  • 62
  • 89
  • 125
  • 1
    Pandas does not know about Excel. You need another package for that. – Stefan Falk May 31 '19 at 16:54
  • @displayname are yo usure about your statement?It seems to me like you can use `Styler` objects https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html – JacoSolari Sep 07 '20 at 14:50
  • @Adam Espinosa check these two SO answers: [this](https://stackoverflow.com/questions/28075699/coloring-cells-in-pandas) and [this](https://stackoverflow.com/questions/39299509/coloring-cells-in-excel-with-pandas) – JacoSolari Sep 07 '20 at 14:52

1 Answers1

0

Pandas isn't going to do that. You'll need another library. xlsxwriter can do what you've asked. Here's the documentation.

https://xlsxwriter.readthedocs.io/working_with_pandas.html

Chris
  • 15,819
  • 3
  • 24
  • 37
  • pandas can actually edit the style of an excel output https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html – JacoSolari Sep 07 '20 at 14:53