0

Okay - This has been asked multiple times, but asking again for best possible solution :

I have two excel files (not sheets). the first excel sheet is very huge and has close to 200,000 records. One of the column (Gender) is corrupted and i have to fix it.
I have a second excel file and it has only around 200 records - these have the correct value for those ones which are messed up.

for eg:
Master List with 200,000 records

and this is the file that has correct values with only around 200 records (only the corrupted ones). enter image description here

Now i need a macro , where i need to find these exact 200 records out of 200,000 records (by employee id) and replace the Gender value with correct one.
i found something similar here. but i dont want to loop 200,000 records 200 times. feels like a performance overhead. is there a better option?

I am thinking an ideal solution would be

  1. Loop through 200 items and use employee id per loop
  2. Take that employee id and do a "Find" operation in the Employee id column of the master excel
  3. If found, replace the Gender column value

would there be any other better solution? Any inputs is gladly appreciated

Community
  • 1
  • 1
Night Monger
  • 770
  • 1
  • 10
  • 33

3 Answers3

1

One way to do this through VBA is to just loop through the 200 corrections, comparing the ID with the MATCH function to find the row it belongs on, as opposed to a second loop (a second loop through 20000 would take ages like you say).

For the below sub I have copied and pasted the 200 table into columns 5:7 of the 20000 table, you can either automate this part easily enough, or just put in the correct sheet references for each part of the code.

I've also put in a checking line to make sure there IS a match for the current ID from the small table, otherwise it'd throw up an error. You could put an ELSE in front of the END IF in this error catch to highlight any ID's which weren't actually found. Here's the code, hope this method helps!

Sub replace_things()
With ActiveSheet
For x = 2 To 200 'Change this to however many is in the small table
    cur = .Cells(x, 5) 'Defined cur as ID from small table
    aMatch = Application.WorksheetFunction.CountIf(.Range("A:A"), cur) 'Check to see there's a match in large table
    If aMatch > 0 Then ' if there's a match then...
        theRow = Application.WorksheetFunction.Match(cur, .Range("A:A"), 0) 'get the row number the match is actually on
        .Cells(theRow, 3) = .Cells(x, 7) 'when row is found, replace with the relevant value from col7 (col3 of small table)
    End If
Next x
End With
End Sub
jamheadart
  • 5,047
  • 4
  • 32
  • 63
  • Thanks @jamheadart .. was looking for something similar to this. – Night Monger Sep 28 '16 at 15:19
  • No probs! There's probably a slightly more elegant way to write the code using `Evaluate(COUNTIF(.range("A:A"), cur)` and `Evaluate(MATCH etc` but I didn't get a chance to try it and not sure it'll affect performance too much. – jamheadart Sep 28 '16 at 15:23
  • I am running into a issue.. while the first line worksheetfunction.countif works and returns 1 record, the next line - worksheetfunction.match returns nothing and throws 1004 error. Any idea why? – Night Monger Sep 28 '16 at 22:23
  • That is weird.. it can find "cur" well enough to register it but then can't find it again in the same range using `MATCH` .. did the code change much at all? Maybe missing the `.` in front of the range("A:A"). I'm hoping it's something simple like that and not a weird limit to how many rows `match` can find. If you want you could copy/paste the code in to your original question I may be able to help more. – jamheadart Sep 28 '16 at 22:41
  • There's not many places it can go wrong in that one line really, make sure it's `Application.WorksheetFunction.Match` also `theRow` variable is an integer? Like I mentioned make sure the `range` is preceded by the correct worksheet declaration (it's just `.` on my example cos they were on the same sheet. – jamheadart Sep 28 '16 at 22:45
  • No, so i did some research and change it to Application.Match which seems to be more elegant. it handled the error but it had an error code 2042 which means, that match was not found which is weird. – Night Monger Sep 28 '16 at 23:14
  • The code did not change at all. it is exactly the same as it is above. So i am thinking to replace the match function with find and see if there are any changes to the result – Night Monger Sep 28 '16 at 23:15
  • Find worked absolutely fine. not sure what the problem was with match. thank you for your help – Night Monger Sep 28 '16 at 23:47
  • 1
    I have no idea then. I've used this method a few times and tried putting values way down in to the 20,000 rows to see if there was a match limit or something but I can't get any errors from it. As long you have it working you way :) – jamheadart Sep 29 '16 at 05:08
0

A super quick way, copy your CORRECT employee ID list and paste below the CORRUPT employee ID list... highlight duplicates and correct the highlighted.

Otherwise a VLOOKUP could label which ones are corrupt? basically getting a unique field from your correct list and comparing that to your corrupt list then fixing the ~200 errors.

Bailey
  • 15
  • 6
  • Thanks. thats what we are doing now manually. but this is prone to error and since we have to do it for 200+ Records, also we need to do it 3 times a day which is kind of painful. Thats why planning to do it in a macro. – Night Monger Sep 28 '16 at 01:02
  • `=IF(VLOOKUP(A1,[File1.xls]Sheet1!A:A,1,FALSE),[File1.xls]Sheet1!C1," ")` You can paste this into the D column on your second sheet, youll need to change the [File1.xls] to your workbook name. This will search if the ID is found on File 2 in File1 then return column C (Gender) from File1 if it is found. You can then see if it matches what you have and make adjustments a bit easier this way. This can be completely reversed and done on Sheet1 looking up Sheet2, the purpose of this is to have a side by side of the incorrect/correct genders in the ONE sheet so you can quickly make changes – Bailey Sep 28 '16 at 01:29
  • The next step we can look into if you need to automate even further is a way we can show only your ~200 correct results lined up side by side against the ~200 incorrect results and simply copy/paste the gender column onto it. Data will have to be manipulated for this but I believe it can be possible. – Bailey Sep 28 '16 at 01:33
0

I assume that the employee ID is a unique record so you can paste the correct ones under existing ones, sort by empID and highlight duplicates to find them easily.

Pav
  • 286
  • 1
  • 12