2

I have about 30 different worksheets in a workbook that all have well over 30k entries in each of them. In the second column of each row, there is a path to a file.

This file name could be anything. I, on the other hand, have a list of about 450 different file names that I am checking against.

If the file path does not contain any of those 450 file names, then I want the row deleted.

I am trying to do this as quickly and efficiently as possible. I am not very experienced in VBA and Excel.

lfrandom
  • 1,013
  • 2
  • 10
  • 32
Johnrad
  • 2,637
  • 18
  • 58
  • 98
  • FYI this will take a super long time since you'll be checking (30000 * 30) * 450 – Elias Jul 30 '13 at 19:02
  • I understand this. I am trying to see if there are more efficient ways of doing this. To my knowledge, that is the only way. – Johnrad Jul 30 '13 at 19:03
  • If the filenames are stored in a hashtable, then the time drops to (30000 * 30)+450. – Brilliand Jul 30 '13 at 19:04
  • Per Brilliand's comment: http://stackoverflow.com/a/1309739/1504882 – Elias Jul 30 '13 at 19:05
  • 1
    This could also be done with a bunch of VLOOKUPs fairly quickly (provided you set the last parameter to true), and without writing any VBA. I imagine it would be about 5 times slower than using a hashtable/dictionary. – Brilliand Jul 30 '13 at 19:19
  • 2
    Yep. If you set it to "approximate match", it uses a binary search, and returns a wrong answer if it can't find a right one. If you set it to "exact match", it checks every value individually. Be sure to sort the filenames beforehand, and check whether the filename returned by the VLOOKUP actually matches the filename you're testing. – Brilliand Jul 30 '13 at 19:22
  • Is the "path to a file" in the sheets being searched a full path (including drive/folder hierarchy), or just the filenames? Likewise, is the list being compared against a list of full paths, or just the filename? – Tim Williams Jul 30 '13 at 20:00
  • @TimWilliams Sorry I was a little ambiguous. I am actually comparing against full file paths. – Johnrad Jul 30 '13 at 20:09

1 Answers1

3

One simple way to do this, without writing any VBA, is to use a VLOOKUP:

=VLOOKUP(A1, <absolute_reference_to_file_list>, 1, TRUE)=A1

Then use the filtering tools to delete all rows with FALSE in the new column. A small number of columns might display "#N/A"; this should be treated the same as FALSE.

Note that this method requires the list of filenames to be sorted in advance, since Excel will be performing a binary search on the list of filenames.

This isn't the fastest possible method, but it's fairly fast, and may well be sufficient for your needs.

Brilliand
  • 13,404
  • 6
  • 46
  • 58
  • Please excuse my unfamiliarity with excel. But how do I refer to ? I have the file list in another spreadsheet. – Johnrad Jul 30 '13 at 20:07
  • Excel will do most of the work for you if you put the cursor in the right place in the formula, then tab over to the file list (which you should also have open) and select the list of files. Actually, I think if you click the top of the column with the file list in it (i.e. the letter "A"), Excel will do exactly what you need. Then tab back to where you were editing the formula and press Enter. – Brilliand Jul 30 '13 at 20:13
  • Afterward, the absolute reference should look something like `'[File List.xlsx]Sheet1'!$A$2:$A$451` or `'[File List.xlsx]Sheet1'!A:A`. The dollar signs are what make the reference absolute. You probably don't need them if the reference is in the second form, though it can't hurt to add them. – Brilliand Jul 30 '13 at 20:17
  • Sorry to take back my comment. When running this, I am getting only getting TRUE or FALSE in the first row of the new column. Since its really no big deal, I just copied and pasted my fileNames into a seperate row inside of the spreadsheet I am trying to count. – Johnrad Jul 30 '13 at 20:31
  • So you're saying it worked after pasting the filenames into the same sheet? If the other cells in the new column are blank, you may need to fill down (double-click the lower right-hand corner of the cell while it's selected). – Brilliand Jul 30 '13 at 21:13