1

I have a project to compare report outputs between one system and another. I thought using Excel as the output format would enable me to compare the outputs easily. The layout of the reports doesn't lend itself to using CSV files.

Problems:

  • Although the data in the outputs may be the same, it may be sorted differently.
  • The report could be formatted with numerous data ranges.

I know there are several tools to perform Excel file comparisons, but none of them appear to ignore the sort of the data. They appear to be simple, cell-by-cell comparisons.

So, before I compare all of the cells between the two worksheets, I need to sort the data within each region. Trying to identify all of the regions on a worksheet I ran into a problem. The CurrentRegion property appears to produce incorrect results.

(Too much back story?)

Using Microsoft Excel 2016 and Microsoft Visual Basic for Applications 7.1.

I'm using code like this:

Debug.Print Range("A1").CurrentRegion.Address

In Microsoft's documentation, the CurrentRegion property is defined as

Returns a Range object that represents the current region. The current region is a range bounded by any combination of blank rows and blank columns. Read-only.

The documentation doesn't specify, but I assume that the CurrentRegion of a cell that is in a blank row or blank column is just that cell.

Given the sample data below, UsedRange correctly returns $A$1:$E$13.

Here is a matrix showing the values I expect and the values I get from CurrentRegion:

╔══════╦════════════╦════════════╗
║ Cell ║  Expected  ║   Actual   ║
╠══════╬════════════╬════════════╣
║ A1   ║ $A$1       ║ $A$1:$B$1  ║
║ B1   ║ $B$1       ║ $B$1       ║
║ B2   ║ $B$2       ║ $A$1:$C$4  ║
║ B3   ║ $A$3:$C$4  ║ $A$3:$C$4  ║
║ B6   ║ $B$6       ║ $B$6       ║
║ B7   ║ $B$7       ║ $B$7:$E$13 ║
║ C10  ║ $C$7:$E$13 ║ $C$7:$E$13 ║
╚══════╩════════════╩════════════╝

What am I doing wrong? Is there some configuration that I have wrong? Some setting I missed? Or did Microsoft fail to test boundary conditions?

Is there a better way to identify each region on a worksheet?

Is there a better way to compare these files?

╔════╦═══╦════╦══════╦══════╦═════╦═══╗
║    ║ A ║ B  ║  C   ║  D   ║  E  ║ F ║
╠════╬═══╬════╬══════╬══════╬═════╬═══╣
║  1 ║   ║  1 ║      ║      ║     ║   ║
║  2 ║   ║    ║      ║      ║     ║   ║
║  3 ║ 4 ║ 10 ║      ║      ║     ║   ║
║  4 ║   ║    ║ 2    ║      ║     ║   ║
║  5 ║   ║    ║      ║      ║     ║   ║
║  6 ║   ║    ║      ║      ║     ║   ║
║  7 ║   ║    ║      ║ asdf ║     ║   ║
║  8 ║   ║    ║ asdf ║ rtuy ║     ║   ║
║  9 ║   ║    ║ asdf ║ vbnm ║     ║   ║
║ 10 ║   ║    ║ vbnm ║      ║     ║   ║
║ 11 ║   ║    ║ vbnm ║      ║     ║   ║
║ 12 ║   ║    ║ fgjh ║ rtyu ║ jkl ║   ║
║ 13 ║   ║    ║ fghj ║      ║     ║   ║
║ 14 ║   ║    ║      ║      ║     ║   ║
╚════╩═══╩════╩══════╩══════╩═════╩═══╝

To clarify based on Eric Murpy's answer

The layout can be anything. But for any given report, it will match between the source and target systems.

I considered sorting the entire UsedRange. But a report may have two tables next to each other. I would need to sort the two tables independently of each other.

No cell highlighting. I'll be comparing about 20,000 reports, so the final solution will be VBScript, JavaScript, or PowerShell that either uses or reproduces the functionality I'm trying to create in VBA. The output will be text files.

The reports in the different reporting environments will be generating SQL using different SQL generation routines. They should produce the same results, but there's no guarantee the data will be returned from the database server in the same order. This wouldn't be a problem if the output was deliberately sorted in the reports, but that's not a feature all report developers care about.

The specific task I am having trouble with is accurately identifying all of the data regions in each worksheet so I can sort them. (This will be considerably faster than manually editing each report to ensure the sorting is correct.) Then the cell-by-cell comparison between the two spreadsheets will work.

Mikku
  • 6,538
  • 3
  • 15
  • 38
dougp
  • 2,810
  • 1
  • 8
  • 31
  • To answer the final question to your post, there're numerous ways to compare these files. Regarding the examples you have with addresses, those are all true. Essentially, the address that you're outputting with `.currentrange.address` is the UPPER-LEFT bound. if there is no data, it attempts to find the contiguous data, and of that data range the last used column and last used row, hence your `$B$7 ║ $B$7:$E$13`. – Cyril Aug 08 '19 at 18:24
  • To give a little more guidance for one of the numerous ways to perform this type of comparison, you may want to use 2 `variant arrays` and compare them, such that you can `match()` values in the array to determine what *row* you are comparing in each array. This is made easier if you have a unique identifier, otherwise you can go 1 by 1 and compare column to column to determine if anything is similar/same ([i made a similar post yesterday](https://stackoverflow.com/a/57364647/3233363)). – Cyril Aug 08 '19 at 18:26
  • `The layout of the reports doesn't lend itself to using CSV files.` I'm confused - why is this? At the end of the day it looks like you have data in columns and rows and will be comparing alphanumeric data. – G42 Aug 08 '19 at 18:45
  • Current region will always expand outwards (in all directions) until it hits an empty row/column or the edge of the sheet, as long as there is some "adjacent" content- ie. directly touching the "origin" (top/bottom/left/right/diagonally) – Tim Williams Aug 08 '19 at 19:04
  • gms0ulman -- Shen the reporting system creates a CSV output, it is often not the whole report -- just the results of the first query of many the report uses. – dougp Aug 08 '19 at 19:04
  • Cyril -- If my output has two different tables separated by a single row, CurrentRegion will grab both tables when I'm inspecting the intervening row. Sorting that will not make any sense. – dougp Aug 08 '19 at 19:08
  • Cyril -- Variant arrays containing what? The contents of the regions I can't seem to accurately identify? – dougp Aug 08 '19 at 19:10

1 Answers1

0

Your post is somehow chaotic.

First of all, there is no magical algorithm which could compare two unsorted sets of data because of one simple fact: the same value or entry may appear in multiple columns etc.

You'd need to specify more clearly what kind of output there COULD be. E.g. are you expecting each report to have a fixed number of columns or rows? Could data repeat in multiple columns e.g. value 1.05 in column C and F?

Finally, what do you expect the macro to do in the end - just output a message if both reports are the same or not, or highlight differences (this could potentially be very hard)?