Assume the string:
item1, item1N, item1Z, item1fhg, item1_any_letters, item2, item3, item3N, item3H
my goal output is simply
item1, item2, item3
this is about a 100,000 line Excel file currently, but can be migrated to another program etc if needed temporarily.
Essentially I need to determine duplicates (any initial phrase ending in a number) with no regard to letters after the number. Some phrases might have for example "Brand item2, Brand item34" as well, the only determining factor of a duplicate is any and all terminology AFTER the number.
any ideas on where to begin with this? Each string usually has between 2 and 500 values in it, seperated by comma and a space. No comma follows the final value.