1

I get a report daily and it has around 8500 lines on it, I have been trying to make a macro that will do what I have been doing manually. The problem with the report is that all the lines are not the same format(i.e Line 1: number,text,text,number and Line 2: text, number, number, text).

I want to compare the new file to the old file and output the new differences. I can get the macro to run through the 2 files but it isn't flagging any of the lines as different but I know they are.

Sub test()

Dim yesterdayFile As String
Dim todayFile As String

yesterdayFile = Application.GetOpenFilename()
todayFile = Application.GetOpenFilename()
Dim yesterdayLine As String
Dim todayLine As String
Dim txt As String
Dim i, j, k, sameLine As Integer
Dim wkbTemp As Workbook
i = 1
j = 1
k = 1
sameLine = 0


Open yesterdayFile For Input As #1
Do Until EOF(1)
    sameLine = 1 'reset write operator
    Open todayFile For Input As #2
Line Input #1, yesterdayLine
    Do Until EOF(2)
        Line Input #2, todayLine
        If StrComp(yesterdayLine, todayLine) = 0 Then 'compare lines in files if same then flag write operator to 1
           sameLine = 1
        End If
        j = j + 1 'inner loop counter
    Loop
If sameLine = 0 Then 'if write operator is not active then output line
    Cells(i, 1) = yesterdayLine
    i = i + 1 'counter for cells
End If
Close #2
k = k + 1 'outer loop counter
Loop

'test line to see if its eof
Cells(1, 10) = i
Cells(2, 10) = j
Cells(3, 10) = k

Close #1



End Sub

Test files being used to run the test faster:

Yesterday File:

10001,April,Apple
10002,Book,Bush
10004,Dog,Days
10006,Free,Food
10008,Happy,Help
10009,Ikky,Icing
10010,Jamming,Jupiter

Today File:

10001,April,Apple
10002,Book,Bush
10003,Cat,Cattle
10004,Dog,Days
10005,Echo,Eggg
10006,Free,Food
10007,Good,Game
10008,Happy,Help
10009,Ikky,Icing
10010,Jamming,Jupiter

Note: In the real data there is no "unique ID field"

Counters end with correct numbers so I know its running through the recursions

EDIT: I know this can be done easily with other languages, but I can only access excel vba from my work terminal and can't take files off network.

Community
  • 1
  • 1
Nealin
  • 11
  • 1
  • 5
  • The exact cells must be different? Or if the order of a line is different, but the values are the same it is ok? – danieltakeshi Sep 18 '17 at 16:55
  • It is testing the input lines from the files not the direct cells, they are CSV files so the whole row is listed as "text" | "number" | "number" | "text". Delimiting is a problem I will work on when I can actually get the lines to output. – Nealin Sep 18 '17 at 17:04
  • I think you will have to work with [arrays](https://stackoverflow.com/questions/34563525/improving-the-performance-of-for-loop/34564306#34564306) and match both multidimensional arrays. Or you can input the csv data on the cells and match them inside a spreadsheet. [Performance of methods](https://fastexcel.wordpress.com/2011/10/26/match-vs-find-vs-variant-array-vba-performance-shootout/) – danieltakeshi Sep 18 '17 at 17:08

2 Answers2

0

If I'm understanding, you have 4 fields that are used in both the old and new workbook, assuming the same number of lines/tasks in each wb. It's not pretty, but you can assess similar to:

Dim i as Long, j as Long, k as Long, l as Long, m as Long

If wbNew.shNew.Cells(i, 1).Value = wbOld.shOld.CellS(i,1).Value OR wbNew.shNew.Cells(i, 1).Value = wbOld.shOld.CellS(i,2).Value OR wbNew.shNew.Cells(i, 1).Value = wbOld.shOld.CellS(i,3).Value OR wbNew.shNew.Cells(i, 1).Value = wbOld.shOld.CellS(i,4).Value OR Then
    j=1
End If

If wbNew.shNew.Cells(i, 2).Value = wbOld.shOld.CellS(i,1).Value OR wbNew.shNew.Cells(i, 2).Value = wbOld.shOld.CellS(i,2).Value OR wbNew.shNew.Cells(i, 2).Value = wbOld.shOld.CellS(i,3).Value OR wbNew.shNew.Cells(i, 2).Value = wbOld.shOld.CellS(i,4).Value OR Then
    k=1
End If

If wbNew.shNew.Cells(i, 3).Value = wbOld.shOld.CellS(i,1).Value OR wbNew.shNew.Cells(i, 3).Value = wbOld.shOld.CellS(i,2).Value OR wbNew.shNew.Cells(i, 3).Value = wbOld.shOld.CellS(i,3).Value OR wbNew.shNew.Cells(i, 3).Value = wbOld.shOld.CellS(i,4).Value OR Then
    l=1
End If

If wbNew.shNew.Cells(i, 4).Value = wbOld.shOld.CellS(i,1).Value OR wbNew.shNew.Cells(i, 4).Value = wbOld.shOld.CellS(i,2).Value OR wbNew.shNew.Cells(i, 4).Value = wbOld.shOld.CellS(i,3).Value OR wbNew.shNew.Cells(i, 4).Value = wbOld.shOld.CellS(i,4).Value Then
    m=1
End If

If (i+j+k+l)=4 Then
    wbNew.shNew.Rows(i).Interior.Color=2
End If

j=0
k=0
l=0
m=0

Using wbNew and shNew to denote using the newest workbook, and wbOld and shOld for yesterday's workbook. This would all be inside of a loop, and you would need to find the last row.


You could also take an approach to use Find() or Match(), such that:

z = Application.Match(wbNew.shNew.Cells(i,1),wbOld.sheOld.Columns(1)).Row
If wbNew.shNew.Cells(i,1).Value = wbOld.shOld.Cells(z, 1).Value OR wbNew.shNew.Cells(i,1).Value = wbOld.shOld.Cells(z, 2).Value OR wbNew.shNew.Cells(i,1).Value = wbOld.shOld.Cells(z, 3).Value OR wbNew.shNew.Cells(i,1).Value = wbOld.shOld.Cells(z, 4).Value Then
    j=1
End If

This would be more the case if you had anything unique per line, so you can find z, then do the comparison only needing to loop through the one workbook/sheet.


Edit:

Adding an example of looping through both columns and rows (nested loop), and marking the cell interior to flag if true:

Dim r as long, c as Long

For r = 1 to LR
    For c = 1 to LC
        If Cells(r, c).Value = "Moo" Then
            If Cells(r, c).Interior.Color <> 2 Then
                Cells(r, c).Interior.Color=2
            End If
        End If
    Next c
Next r

It might help to make the cell you want to work with a dimension, such as:

Dim y as Variant

y = wbNew.shNew.Cells(r, c).Value

This just makes it easier to edit, imo.

Cyril
  • 6,448
  • 1
  • 18
  • 31
  • The example code is only a few fields but the real data ranges from 19 to 42 fields. So using this for 42 fields would be quite a lot, Which is why I was testing the input lines directly to each other. – Nealin Sep 18 '17 at 17:08
  • @Nealin are there headers at all on the page? Would make it easier so you can denote the exact cells to deal with. – Cyril Sep 18 '17 at 17:09
  • Unfortunately the files are CSV files that contain about 8 separate tables so the header does not work for the whole length of the files – Nealin Sep 18 '17 at 17:11
  • @Nealin With that many dynamic fields, you might run into issues without formatting the data per table, first. You would work with Arrays in both cases (current and formatted). I don't think I've an efficient way to go about that process for (worst case). You would essentially have two loops, going down rows and the column for each row. – Cyril Sep 18 '17 at 17:14
  • So using VBA I can't just compare the direct input lines from the files which is what I have been attempting to do with the code posted above. – Nealin Sep 18 '17 at 17:17
  • 1
    Yes, you have an array and [it is said](https://stackoverflow.com/a/18769246/7690982) that looping is more efficient than using Match(). And another [performance test](http://dailydoseofexcel.com/archives/2012/07/03/find-matching-data-in-array-speed-test/) is here. In the end you will have to use a loop for each value, either way you choose to use. – danieltakeshi Sep 18 '17 at 17:27
  • 1
    @Nealin Dan has the posts to support it. I added in an example of a loop to show both row & column iteration. Hopefully that gets you where you need to be. – Cyril Sep 18 '17 at 17:49
0

So after Much trial and error I answered my own question, Thank you for all the responses but going from 72mil iterations to 3bil just wasn't an option.

What my code ended up looking like

Sub test()
'Freeze window
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual

'open files to edit
Dim fileA, fileB As String
fileA= Application.GetOpenFilename()
fileB = Application.GetOpenFilename()

'setting variables
Dim lineA, lineB, DQ As String 'read in lines and double quote     variables
Dim i, sameLine As Integer 'row counter and testing(could have used boolean?)
Dim newLine 'object creation for array of line
i = 1
DQ = Chr(34) 'character 34 is "

Open fileA For Input As #1 'open file 1 for append
Do Until EOF(1) 'Outter loop to run through file 1
    sameLine = 0 'reset write operator
    Open fileB For Input As #2 'open file 2 for append
    Line Input #1, lineA'read in line from file 1
    Do Until EOF(2) 'inner loop to run through file 2
        Line Input #2, lineB 'read in line from file 2
        If StrComp(lineA, lineB) = 0 Then 'compare lines in files if same then flag write operator to 1
            sameLine = 1
        End If
    Loop
    If sameLine = 0 Then 'if write operator is not active then output line
        count = Len(lineA) - Len(Replace(lineA, "|", ""))     'count number of columns needed for output
        lineA= Replace(lineA, DQ, "") 'removing all double     quotes from line
        newLine = Split(lineA, "|") 'spliting line into object with | as delimiter
        For counter = 1 To count 'placing line in row
            Cells(i, counter) = newLine(counter - 1)
        Next counter
        i = i + 1 'counter for cells
    End If
Close #2
Loop
Close #1
'unfreezing window
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End Sub

This works on .txt and .csv and compares the input lines directly BEFORE writing them to any cells. My problem was unique because of time stamps at the end of each line which I added a few lines to fix.

Nealin
  • 11
  • 1
  • 5