1

I'm dealing with a spreadsheet containing ranges of Bates numbers for legal discovery. All numbers are six digits. A typical cell might be named something like "Court records 000001-000100" or "Search warrant 000300-000300." Is there are way to identify cells where the SAME six-digit string is repeated, as in the second example, and replace it with only the first instance of that string? E.g., "Search warrant 000300."

I prefer a formula, but VBA would also work. I've found methods for identifying ANY six-digit string, but not one that will specifically look for the same number twice. Thank you so much for any suggestions you may have!

Livia
  • 15
  • 4
  • `I've found methods for identifying ANY six-digit string, but not one that will specifically look for the same number twice.` maybe you can share that, because that might be helpful for us helping you. – P.b Dec 06 '21 at 17:47
  • 2
    This was one suggestion: https://stackoverflow.com/questions/47935549/how-to-extract-the-six-digit-number-in-a-cell-string This was another: https://stackoverflow.com/questions/62493931/extract-fixed-length-number-from-excel-cell However, I don't believe either of these would let me specify that I need the same number twice. – Livia Dec 06 '21 at 17:50
  • Which excel-version are you using? – OverflowStacker Dec 06 '21 at 17:55
  • Looks like Excel for Office 365. – Livia Dec 06 '21 at 18:00
  • 1
    Looking at the sample data, maybe just `=SUBSTITUTE(A1,"-"&MID(A1,FIND("-",A1)-6,6),"")` may work? – JvdV Dec 07 '21 at 06:17
  • It can be so easy. :) – OverflowStacker Dec 07 '21 at 08:07
  • That's a great suggestion, but unfortunately many file names are much more complex and contain a dash in the middle (E.g. "2016 tax records - Smith," "2016 tax records - Jones," etc.) – Livia Dec 07 '21 at 13:53

2 Answers2

3

With o365 you can use

=IF(LET(arr,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"&",""),"_"," "),"-"," "),".pdf","")," ","</s><s>")&"</s></t>","//s"),AGGREGATE(15,6,arr/ISNUMBER(arr),1)=AGGREGATE(15,6,arr/ISNUMBER(arr),2)),LEFT(A1,SEARCH("-",A1)-1)&".pdf",SUBSTITUTE(A1,".pdf","")&".pdf")

For other versions one needs to replace every arr with the FILTERXML-part. If it is possible that there are other punctuation marks around the numbers, then these must also be substituted. enter image description here

EDIT 1: Just read, that you want to delete the duplicate.

EDIT 2: This formula would keep the .pdf-file-extension.

EDIT 3: Preparing for _ and &. This assumes, that already redacted documents never have duplicates numbers, as they are already redacted.

OverflowStacker
  • 1,340
  • 1
  • 10
  • 17
  • Should the formula keep the `.pdf`-part? – OverflowStacker Dec 06 '21 at 20:32
  • Thank you! This works beautifully, except that I neglected to mention that the file names end in a ".pdf." Mea culpa! This appears to create an issue with stripping out the period in the file suffix and making it difficult for the formula to recognize the duplicate. Is there a way around that? – Livia Dec 06 '21 at 20:41
  • Perfect!!! Thank you so much - you have no idea how much of a timesaver this will be! – Livia Dec 06 '21 at 20:53
  • Glad to help. You can mark as answer if you are satisfied. Have a nice day or evening as it is in my case. :) – OverflowStacker Dec 06 '21 at 20:55
  • I will! I am running into one issue ... in a case where a file is already redacted, I am getting a #VALUE! or $NUM! error (probably because the file name in that case is something like "Business records 000200-000250_Redacted.pdf". Is there a workaround for that?It also seems to generate sometimes when the file is NOT redacted but contains an ampersand -- for example "Letter & probate documents 000300-000310". – Livia Dec 06 '21 at 21:59
  • In the `FILTERXML`-part you can nest more `SUBSTITUTE`-functions like the three ones already in there for possible connecting-chars like the underscore and ampersand. – OverflowStacker Dec 07 '21 at 05:56
  • When I'm home later, I can extend the formula. – OverflowStacker Dec 07 '21 at 06:08
  • I tried that but I may have placed it in the wrong place. I have just enough Excel knowledge to be dangerous, lol. – Livia Dec 07 '21 at 14:09
  • Ok, i implemented the other things. But my formula should also fail for the cases you mentioned to JvdV, if there are duplicates for the numbers. – OverflowStacker Dec 07 '21 at 16:28
  • Works great. Thank you!! – Livia Dec 07 '21 at 18:59
1

If not already in a table, turn it into a table using Insert Table from the data ribbon. This allows you to refer to other columns by name and store column formulas once per *column" rather than have to drag it down the column and store the formula once per cell.

Assuming your doc name column title is "DocName", add a column titled "JustNumbers" with a formula of =RIGHT([@DocName],13)

Then add another column with a formula of

`=IF(LEFT([@JustNumbers],6)=RIGHT([@JustNumbers],6), LEFT([@DocName],LEN([@DocName])-7),[@DocName])`

You can hide the other columns and just show this last column.

JSmart523
  • 2,069
  • 1
  • 7
  • 17