3
df1 = pd.read_excel(mxln)  # Loads master xlsx for comparison
df2 = pd.read_excel(sfcn)  # Loads student xlsx for comparison
difference = df2[df2 != df1]  # Scans for differences

Wherever there is a difference, I want to store those cell locations in a list. It needs to be in the format 'A1' (not something like [1, 1]) so I can pass it through this:

redFill = PatternFill(start_color='FFEE1111', end_color='FFEE1111', fill_type='solid')
lsws['A1'].fill = redFill
lsfh.save(sfcn)  

I've looked at solutions like this, but I couldn't get it to work/don't understand it. For example, the following doesn't work:

def highlight_cells():
    df1 = pd.read_excel(mxln)  # Loads master xlsx for comparison
    df2 = pd.read_excel(sfcn)  # Loads student xlsx for comparison
    difference = df2[df2 != df1]  # Scans for differences
    return ['background-color: yellow']

df2.style.apply(highlight_cells) 
Community
  • 1
  • 1
stackoa
  • 73
  • 2
  • 7

1 Answers1

2

To get the difference cells from two pandas.DataFrame as excel coordinates you can do:

Code:

def diff_cell_indices(dataframe1, dataframe2):
    from openpyxl.utils import get_column_letter as column_letter

    x_ofs = dataframe1.columns.nlevels + 1
    y_ofs = dataframe1.index.nlevels + 1
    return [column_letter(x + x_ofs) + str(y + y_ofs) for
            y, x in zip(*np.where(dataframe1 != dataframe2))]

Test Code:

import pandas as pd
df1 = pd.read_excel('test.xlsx')
print(df1)

df2 = df.copy()
df2.C['R2'] = 1
print(df2)

print(diff_cell_indices(df1, df2))

Results:

    B  C
R2  2  3
R3  4  5

    B  C
R2  2  1
R3  4  5

['C2']
Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
  • @CharlieClark, thanks for the edit. As an FYI, I copy and paste that from http://stackoverflow.com/a/33598710/7311767 – Stephen Rauch May 14 '17 at 13:13
  • I don't understand what "df2.C['R2'] = 1" is doing. Also, when I run the function I get a huge list of seemingly random cell coordinates, not the ones that are different. – stackoa May 15 '17 at 14:30
  • That just forces the two frames to have a difference, so that the test code can display a difference. It literally reads as set column C row R2 equal to 1. That is pandas. And I would have to see you actual data to help with it not showing the correct differences. Would suggest trying my test first, and then build from there. – Stephen Rauch May 15 '17 at 14:35
  • Is it supposed to be df2 = df1.copy()? If I just use df.copy() it says name df not defined. If I make it df1.copy() it says DataFrame object has no attribute 'C' – stackoa May 15 '17 at 15:22
  • `C` is the column name of my test data, just like `R2` is a row name. You are likely going to need to do some studying of pandas if you wish to use it productively. – Stephen Rauch May 15 '17 at 15:33
  • I got a test to work. I typed the cells in the excel file myself, though. Am I correct in that the pandas dataframe shifts everything over 1 column? So if there was a diff in A2, it'd print as B2? This might cause issues because I don't want to save this as a pandas dataframe, I want it to be a normal xlsx. What are your thoughts on this thread (created by me)? http://stackoverflow.com/questions/43789051/python-excel-highlight-cell-differences Instead of printing out a list of cell coordinates, could I just highlight what the diff detects? I couldn't get the given solution to work – stackoa May 16 '17 at 14:27
  • If you don't really want pandas and just want to look at an Excel file, I would suggest just using openpyxl, and looping through the cells with your own code. – Stephen Rauch May 16 '17 at 15:15
  • I've gotten your code to work. It prints out cell locations and highlights them. The problem is that it also highlights empty cells and I can't figure out how to fix it. I've tried a few things like: dataframe2 == nan and df2 == '' and a few other things but it doesn't work. Any thoughts? – stackoa Jun 07 '17 at 14:32
  • @stackoa, My best suggestion is to post a new question and show your code. If you like, you can post a link here to make sure I see the post. Also you might want to take a look at [this link](http://stackoverflow.com/help/someone-answers) – Stephen Rauch Jun 07 '17 at 14:34