how can I find and highlight duplicates in excel files using openpyxl, numpy and pandas

Question

I'm working on an app that finds and highlight duplicates in excel files. I have the following code where I am just adding --> where there are duplicates.

import pandas as pd
import numpy as np
from openpyxl import Workbook

dftest1=pd.read_excel('files/test1.xlsx')
dftest2=pd.read_excel('files/test2.xlsx')

comparevalues = dftest1.values == dftest2.values
rows,cols=np.where(comparevalues==True)
for item in zip(rows,cols):
     dftest1.iloc[item[0], item[1]] = '{} --> {}'.format(dftest1.iloc[item[0], item[1]],dftest2.iloc[item[0], item[1]])
     dftest1.to_excel('./files/output.xlsx',index=True,header=False)

I tried using fill but got the error that fill is read-only. How can I use styles or fill to highlight duplicates?

You could apply a color for a selection of cells based on conditions: See https://stackoverflow.com/a/73209396/13604396 or [Styling](https://pandas.pydata.org/pandas-docs/version/1.1.5/user_guide/style.html) from the `pandas` docs. — Confused Learner, Aug 03 '22 at 12:06
Question excel is know to have the capabilities of highliting color by itself, why are you using pandas? For such a task. Now if you want to make a visualization in pandas while displaying in jupyter notebook and so on that is another question — INGl0R1AM0R1, Aug 03 '22 at 14:33

how can I find and highlight duplicates in excel files using openpyxl, numpy and pandas

0 Answers0