179

We have a lot of spreadsheets (xls) in our source code repository. These are usually edited with gnumeric or openoffice.org, and are mostly used to populate databases for unit testing with dbUnit. There are no easy ways of doing diffs on xls files that I know of, and this makes merging extremely tedious and error prone.

I've tried to converting the spreadsheets to xml and doing a regular diff, but it really feels like it should be a last resort.

I'd like to perform the diffing (and merging) with git as I do with text files. How would I do this, e.g. when issuing git diff?

pnuts
  • 58,317
  • 11
  • 87
  • 139
neu242
  • 15,796
  • 20
  • 79
  • 114
  • 4
    I think the tags unit-testing and dbunit are incorrectly applied here. The question is about file compare for a specific file format and has nothing to do with unit testing. – Hamish Smith Sep 23 '08 at 07:29
  • 1
    Not an answer (it requires Excel & is a commercial product) but for people brought here by GooBinghoo - http://www.formulasoft.com/excel-compare.html works well for me. – CAD bloke Aug 21 '14 at 22:12
  • 2
    I use this Python script for diffing Excel we check-in to git. I ported this GO code (which was ported from Perl) to Python: https://github.com/tokuhirom/git-xlsx-textconv#see-also It allows you to use `git diff` and `gitk` – nmz787 Oct 02 '15 at 21:52
  • BeyondCompare? http://www.scootersoftware.com/features.php?zz=features_multifaceted – Nishant Sharma Dec 09 '15 at 04:55
  • A simple way to go would be to export the data of both spreadsheets as CSV/text and doing a regular diff (with your preferred editor or diff files) – PPC Jan 24 '18 at 15:22
  • @PPC That's what I'm trying to avoid. Read the original post again ;) – neu242 Jan 25 '18 at 16:33

22 Answers22

119

We faced the exact same issue in our co. Our tests output excel workbooks. Binary diff was not an option. So we rolled out our own simple command line tool. Check out the ExcelCompare project. Infact this allows us to automate our tests quite nicely. Patches / Feature requests quite welcome!

Dennis C
  • 24,511
  • 12
  • 71
  • 99
na_ka_na
  • 1,558
  • 1
  • 12
  • 15
  • 2
    @KimStacks yes works for all xls, xlsx, ods. And can even compare any one type with the other, e.g. xls v/s xlsx. – na_ka_na May 23 '14 at 21:59
  • 1
    Great tool... but it says "Diff failed: Failed to read as excel file:" for some perfectly valid xls files. For anyone looking for a (much poorer, from all other points of view) alternative: see https://github.com/toobaz/xlrd_diff – Pietro Battiston May 09 '15 at 13:47
  • 2
    @PietroBattiston pls log a ticket in github and I'll look at it. – na_ka_na May 11 '15 at 17:40
  • @na_ka_na Thank you for creating this tool! – jgpawletko Mar 02 '16 at 14:15
115

Quick and easy with no external tools, works well as long as the two sheets you are comparing are similar:

  • Create a third spreadsheet
  • Type =if(Sheet1!A1 <> Sheet2!A1, "X", "") in the top left cell (or equivalent: click on the actual cells to automatically have the references inserted into the formula)
  • Ctrl+C (copy), Ctrl+A (select all), Ctrl+V (paste) to fill the sheet.

If the sheets are similar, this spreadsheet will be empty except for a few cells with X in them, highlighting the differences. Unzoom to 40% to quickly see what is different.

Jasper
  • 2,166
  • 4
  • 30
  • 50
Laurent
  • 5,953
  • 14
  • 43
  • 59
  • I searched stackoverflow for a solution to this, I use mostly numeric data so I do a subtraction to see the differance. And conditional formatting to set a red-color for reduced and blue for increased. What I'm looking for is an easy way to deal with non-numeric codes mixed with the numbers. – tovare Jun 09 '10 at 07:03
  • 8
    Doesn't this do cell by cell comparison? I mean if left side has one extra row at top, it will give all the remaining rows (and cells) different. If it is like that, that is not very useful. – TheTechGuy Nov 16 '11 at 14:12
  • 1
    @Thecrocodilehunter: you can always delete that row at the top and then compare the rest. If the differences are a lot more complex, then of course you need a different tool. This is useful for quick one-off comparisons, like when Excel says you changed the spreadsheet and you're afraid you accidentally edited a field. – iconoclast Oct 16 '12 at 19:00
  • 1
    I liked this one. Instead of doing X, you can also do "1", and have a SUM as the first rows&columns. Add a som on the very first field and you'll quickly see how many fields differ. – Konerak Feb 13 '13 at 10:00
  • 6
    I changed the formula a bit so I didn't have to go looking at what the exact differences were. =IF(Sheet1!A1 <> Sheet2!A1, CONCATENATE("Sheet 1=",Sheet1!A1, " Sheet 2=",Sheet2!A1), "") – Martyn Feb 21 '13 at 15:46
  • Use . instead of ! for libre/openoffice. =IF(Sheet1.A1 <> Sheet2.A1, "X", "") – CoR Oct 16 '13 at 09:41
  • 2
    This answer compares cell values rather than formulas. If you want to compare the cell formulas rather than the values produced by those formulas, first convert all formulas to static values then use the technique from this answer. To convert (first backup your spreadsheet), do a find/replace on = and replace with something else like %%%. Now use this comparison technique on the results. – jlpp Feb 15 '14 at 14:55
  • 1
    You can just type `=Sheet1!A1=Sheet2!A1`. This will print TRUE or FALSE. You can then do conditional formatting or `=countif(A1:B2, FALSE)` or something similar. – user2023861 Sep 18 '14 at 15:52
15

You can try this free online tool - www.cloudyexcel.com/compare-excel/

It gives a good visual output online, in terms of rows added,deleted, changed etc.

enter image description here

Plus you donot have to install anything.

Shashank Singla
  • 1,797
  • 17
  • 13
13

I've done a lot of comparing of Excel workbooks in the past. My technique works very well for workbooks with many worksheets, but it only compares cell contents, not cell formatting, macros, etc. Also, there's some coding involved but it's well worth it if you have to compare a lot of large files repeatedly. Here's how it works:

A) Write a simple dump program that steps through all worksheets and saves all data to tab-separated files. Create one file per worksheet (use the worksheet name as the filename, e.g. "MyWorksheet.tsv"), and create a new folder for these files each time you run the program. Name the folder after the excel filename and add a timestamp, e.g. "20080922-065412-MyExcelFile". I did this in Java using a library called JExcelAPI. It's really quite easy.

B) Add a Windows shell extension to run your new Java program from step A when right-clicking on an Excel file. This makes it very easy to run this program. You need to Google how to do this, but it's as easy as writing a *.reg file.

C) Get BeyondCompare. It has a very cool feature to compare delimited data by showing it in a nice table, see screenshot.

D) You're now ready to compare Excel files with ease. Right-click on Excel file 1 and run your dump program. It will create a folder with one file per worksheet. Right-click on Excel file 2 and run your dump program. It will create a second folder with one file per worksheet. Now use BeyondCompare (BC) to compare the folders. Each file represents a worksheet, so if there are differences in a worksheet BC will show this and you can drill down and do a file comparison. BC will show the comparison in a nice table layout, and you can hide rows and columns you're not interested in.

Jasper
  • 2,166
  • 4
  • 30
  • 50
thvo
  • 1,532
  • 2
  • 15
  • 29
10

I have found xdocdiff WinMerge Plugin. It is a plugin for WinMerge (both OpenSource and Freeware, you doesn't need to write a VBA nor save an excel to csv or xml). It works just for the celd's contains.

This plugin supports also:

  • .rtf Rich Text
  • .docx/.docm Microsoft WORD 2007(OOXML)
  • .xlsx/.xlsm Microsoft Excel 2007(OOXML)
  • .pptx/.pptm Microsoft PowerPoint 2007(OOXML)
  • .doc Microsoft WORD ver5.0/95/97/2000/XP/2003
  • .xls Microsoft Excel ver5.0/95/97/2000/XP/2003
  • .ppt Microsoft PowerPoint 97/2000/XP/2003
  • .sxw/.sxc/.sxi/.sxd OpenOffice.org
  • .odt/.ods/.odp/.odg Open Document
  • .wj2/wj3/wk3/wk4/123 Lotus 123
  • .wri Windows3.1 Write
  • .pdf Adobe PDF
  • .mht Web Archive
  • .eml Exported files from OutlookExpress

Regard, Andres

LPL
  • 16,827
  • 6
  • 51
  • 95
Andres
  • 101
  • 1
  • 2
  • 1
    Unfortunately, it doesn't allow saving changes to the unpacked files, so it isn't useful for merging. It is free though. – Sogger Jan 09 '12 at 17:19
6

Hmmm. From the Excel menu choose Window -> Compare side by side?

  • 1
    +1 works well but does not highlights differences in office 2007. I guess in older versions it does. Still very good for visual comparison. – TheTechGuy Nov 14 '11 at 15:18
  • 3
    yeah, there is no highlighting, so the only benefit is synchronous scrolling. – Sogger Jan 09 '12 at 17:24
5

Do you use TortoiseSVN for doing your commits and updates in subversion? It has a diff tool, however comparing Excel files is still not really user friendly. In my environment (Win XP, Office 2007), it opens up two excel files for side by side comparison.

Right click document > Tortoise SVN > Show Log > select revision > right click for "Compare with working copy".

Casper
  • 1,242
  • 1
  • 11
  • 12
4

There is a library daff (short for data diff) which helps in comparing tables, producing a summary of their diffs, and using such a summary as a patch file.

It is written in Haxe, so it can be compiled in major languages.

I have made an Excel Diff Tool in Javascript with help of this library. It works well with numbers & small strings but the output is not ideal for long strings (e.g. a long sentence with with minor character change).

shubhu
  • 123
  • 5
3

I would use the SYLK file format if performing diffs is important. It is a text-based format, which should make the comparisons easier and more compact than a binary format. It is compatible with Excel, Gnumeric, and OpenOffice.org as well, so all three tools should be able to work well together. SYLK Wikipedia Article

Adam Hawkes
  • 7,218
  • 30
  • 57
  • This is a great solution tat should be adopted as common practice in environments that are frequently using git with Excel files (and other files). It is definitely "git" friendly (though the diffs are not super human readable) and doesn't require any extra tools beyond "modern" Excel (it is 2019 now). It is also "two-way" which means that other users can save their Excel spreadsheets in .slk (SYLK) format and then they open with all of the proper formatting, etc. in Excel when needed. – D. Woods Oct 07 '19 at 20:29
  • This is a good solution but LibreOffice drops some of the file formatting but if the primary concern is data then it might be a very clean solution. – AnthonyVO Sep 05 '22 at 18:28
3

I know several responses have suggested exporting the file to csv or some other text format, and then comparing them. I haven't seen it mentioned specifically, but Beyond Compare 3 has a number of additional file formats that it supports. See Additional File Formats. Using one of the Microsoft Excel File Formats you can easily compare two Excel files without going through the export to another format option.

n00begon
  • 3,503
  • 3
  • 29
  • 42
mattsmith321
  • 6,793
  • 5
  • 28
  • 18
3

Newer versions of MS Office come with Spreadsheet Compare, which performs a fairly nice diff in a GUI. It detects most kinds of changes.

Gremlin
  • 228
  • 1
  • 11
  • Although this answer probably won't help OP's situation with a command line based `diff` merge, this Spreadsheet Compare tool was perfect for my purposes (checking for differences between OpenXML automation output vs Excel COM automation output). – ErrCode Sep 10 '17 at 12:19
  • Note: I did try to investigate automation using this tool, but it didn't work for me (tool just kept crashing): https://stackoverflow.com/a/35905262/7270462 – ErrCode Sep 10 '17 at 12:38
2

Use Altova DiffDog

Use diffdog's XML diff mode and Grid View to review the differences in an easy to read tabular format. Text diff'ing is MUCH HARDER for spreadsheets of any complexity. With this tool, at least two methods are viable under various circumstances.

  1. Save As .xml

    To detect the differences of a simple, one sheet spreadsheet, save the Excel spreadsheets to compare as XML Spreadsheet 2003 with a .xml extension.

  2. Save As .xlsx

    To detect the differences of most spreadsheets in a modularized document model, save the Excel spreadsheets to compare as an Excel Workbook in .xlsx form. Open the files to diff with diffdog. It informs you that the file is a ZIP archive, and asks if you want to open it for directory comparison. Upon agreeing to directory comparison, it becomes a relatively simple matter of double-clicking logical parts of the document to diff them (with the XML diff mode). Most parts of the .xslx document are XML-formatted data. The Grid View is extremely useful. It is trivial to diff individual sheets to focus the analysis on areas that are known to have changed.

Excel's propensity to tweak certain attribute names with every save is annoying, but diffdog's XML diff'ing capabilities include the ability to filter certain kinds of differences. For example, Excel spreadsheets in XML form contain row and c elements that have s attributes (style) that rename with every save. Setting up a filter like c:s makes it much easier to view only content changes.

diffdog has a lot of diff'ing capability. I've listed the XML diff modes only simply because I haven't used another tool that I liked better when it comes to differencing Excel documents.

kbulgrien
  • 4,384
  • 2
  • 26
  • 43
1

If you're using Java, you could try simple-excel.

It'll diff spreadsheets using Hamcrest matchers and output something like this.

java.lang.AssertionError:
Expected: entire workbook to be equal
     but: cell at "C14" contained <"bananas"> expected <nothing>,
          cell at "C15" contained <"1,850,000 EUR"> expected <"1,850,000.00 EUR">,
          cell at "D16" contained <nothing> expected <"Tue Sep 04 06:30:00">
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)

I should qualify that we wrote that tool (like the ticked answer rolled their own).

Toby
  • 9,523
  • 8
  • 36
  • 59
  • 2
    Thanks for posting your answer! Please be sure to read the [FAQ on Self-Promotion](http://stackoverflow.com/faq#promotion) carefully. Also note that it is *required* that you post a disclaimer every time you link to your own site/product. – Andrew Barber Sep 10 '12 at 22:43
1

If you have TortoiseSVN then you can CTRL click the two files to select them in Windows Explorer and then right-click, TortoiseSVN->Diff.

This works particularly well if you are looking for a small change in a large data set.

Chris B
  • 5,311
  • 11
  • 45
  • 57
  • This does _not_ work particularly well, however, for binary files such as Excel's XLS format. – Charles Wood Dec 17 '13 at 16:07
  • 1
    @CharlesWood - actually it _does_ work particularly well. Tortoise uses Excel itself to provide the diff and highlights the different cells in red. I've not tried it but I'm pretty sure it also does the same for Word .doc and .docx files (using Word that is as the diff viewer). – Chris B Dec 31 '13 at 12:09
  • Whaaat! Mine doesn't do that. Is it a new feature or did you install a plugin? – Charles Wood Jan 02 '14 at 16:19
  • :-D I've got TortoiseSVN 1.7.12 and it does it out of the box. The help file states that it has support for this - http://tortoisesvn.net/docs/release/TortoiseSVN_en/tsvn-dug-diff.html. It uses a script to diff the files which on my machine is C:\Program Files\TortoiseSVN\Diff-Scripts\diff-xls.js. Is it possible you have a group policy on your machine which has disabled scripting? – Chris B Jan 03 '14 at 10:22
1

I found an openoffice macro here that will invoke openoffice's compare documents function on two files. Unfortunately, openoffice's spreadsheet compare seems a little flaky; I just had the 'Reject All' button insert a superfluous column in my document.

1

xdocdiff plugin for SVN

graveDust
  • 11
  • 1
1

I got the problem like you so I decide to write small tool to help me out. Please check ExcelDiff_Tools. It comes with several key points:

  • Support xls, xlsx, xlsm.
  • With formula cell. It will compare both formula and value.
  • I try to make UI look like standard diff text viewer with : modified, deleted, added, unchanged status. Please take a look with image below for example: enter image description here
kokichi88
  • 51
  • 1
  • 6
1

I'm the co-author of a free, open-source Git extension:

https://github.com/ZoomerAnalytics/git-xltrail

It makes Git work with any Excel workbook file format without any workarounds.

Bjoern Stiel
  • 3,918
  • 1
  • 21
  • 19
0

I don't know of any tools, but there are two roll-your-own solutions that come to mind, both require Excel:

  1. You could write some VBA code that steps through each Worksheet, Row, Column and Cell of the two Workbooks, reporting differences.

  2. If you use Excel 2007, you could save the Workbooks as Open-XML (*.xlsx) format, extract the XML and diff that. The Open-XML file is essentially just a .zip file of .xml files and manifests.

You'll end up with a lot of "noise" in either case if your spreadsheets aren't structurally "close" to begin with.

lesscode
  • 6,221
  • 30
  • 58
  • From Excel 2002 onwards you can also save in 'XML Spreadsheet' format which is simpler than dealing with xlsx files. – Sam Warwick May 25 '09 at 13:16
0

Convert to cvs then upload to a version control system then diff with an advanced version control diff tool. When I used perforce it had a great diff tool, but I forget the name of it.

patrick
  • 16,091
  • 29
  • 100
  • 164
0

Here's what I ended up doing (requires openpyxl), based on how I previously approached git-diffing PowerPoint files. I also got it to work with BeyondCompare:



"""
Setup -- Add these lines to the following files:
--- .gitattributes
*.xlsx diff=xlsx

--- .gitconfig (or repo\.git\config    or your_user_home\.gitconfig) (change the path to point to your local copy of the script)
[diff "xlsx"]
    binary = true
    textconv = python C:/Users/myUserName/Miniconda3/Scripts/git-xlsx-textconv.py

[difftool "xlsx"]
    cmd = t1=`mktemp` && `python C:/Users/myUserName/Miniconda3/Scripts/git-xlsx-textconv.py $LOCAL >$t1` && t2=`mktemp` && `python C:/Users/myUserName/Miniconda3/Scripts/git-xlsx-textconv.py $REMOTE >$t2` && "C:/Program\\ Files/Beyond\\ Compare\\ 4/BCompare.exe $t1 $t2" && rm -f $t1 $t2

usage:
git diff your_excel.xlsx
git difftool --tool xlsx your_excel.xlsx

"""

import sys
from openpyxl import load_workbook


if __name__ == '__main__':
    if len(sys.argv) != 2:
        print("Usage: git-xlsx-textconv file.xslx")

    path_to_excel = sys.argv[1]

    wb = load_workbook(filename=path_to_excel, read_only=True)

    for sn in wb.sheetnames:
        ws = wb[sn]
        par_text = b'#<'
        par_text += sn.encode('utf-8')
        par_text += b'>#\n'
        rows = []
        for row in ws.rows:
            
            rowtext = b''
            is_empty = True
            for cell in row:
                s = '' if cell.value is None else str(cell.value).strip()
                s = s.replace("\\", "\\\\")
                s = s.replace("\n", " ")
                s = s.replace("\r", " ")
                s = s.replace("\t", " ")
                s = s.replace('\r\n', ' ')
                s = s.rstrip('\r\n')
                s = s.rstrip(r'')

                # Convert left and right-hand quotes from Unicode to ASCII
                # found http://stackoverflow.com/questions/816285/where-is-pythons-best-ascii-for-this-unicode-database
                # go here if more power is needed  http://code.activestate.com/recipes/251871/
                # or here                          https://pypi.python.org/pypi/Unidecode/0.04.1
                punctuation = { 0x2018:0x27, 0x2019:0x27, 0x201C:0x22, 0x201D:0x22 }
                s.translate(punctuation).encode('ascii', 'ignore')
                s = s.encode('utf-8')
                if s:
                    is_empty = False
                    rowtext += s
                rowtext += b'|'
            rows.append((rowtext, is_empty))
        rows_new = []
        non_emtpy_seen = False
        num_empties_at_end_of_sheet = 0
        # now figure out if there's a bunch of "empty" rows at the end of the sheet
        for r in reversed(rows):
            text, is_empty = r
            if is_empty and not non_emtpy_seen:
                num_empties_at_end_of_sheet+=1
                continue
            non_emtpy_seen = True
            rows_new.append(text)
        rows_new = list(reversed(rows_new))
        par_text += b'\n'.join(rows_new)
        if num_empties_at_end_of_sheet:
            par_text += f'\n<... and {num_empties_at_end_of_sheet} empty rows that are not shown...>\n'.encode('utf8')
        par_text += b'\n'
        print(par_text.decode('utf8'))
nmz787
  • 1,960
  • 1
  • 21
  • 35
-1

Diff Doc may be what you're looking for.

  • Compare documents of MS Word (DOC, DOCX etc), Excel, PDF, Rich Text (RTF), Text, HTML, XML, PowerPoint, or Wordperfect and retain formatting
  • Choose any portion of any document (file) and compare it against any portion of the same or different document (file).
ConroyP
  • 40,958
  • 16
  • 80
  • 86