0

I have the following task:

string A: well known, can contains 0, O, i, I, 1 for sure - lets say a kind of article number.

string B: a string coming from an OCR. So some of the "0"s can here also be a "O", some "1"s can be a "l". The string has length of about 300 (just to get a feeling)

Now, I would like to know, if the OCR text contains my article number. So in princible, I have to check the OCR as it is at first. When not found, I will replace a first "O" by a "0" and try again. Now I have to try all combinations.

My idea was to define some arrays containing which letter can have similar letters:

[
    ["i", "l", "j", "1"], 
    ["0", "o"], 
    [".", "*"]
]

To reduce the array size (and therefore the amount of possible combinations) I will put everything in lower case.

Now, the hard work starts. Do you know a smart way to walk through the combinations?

Thank you very much in advanced for your help!

Wahyu Kristianto
  • 8,719
  • 6
  • 43
  • 68
  • This may point you in the right direction: https://stackoverflow.com/questions/5506888/permutations-all-possible-sets-of-numbers - it becomes more complex as you'd have the number of permutations changes depending on the number of "lazy" matches\ – Rylee Feb 07 '22 at 06:45
  • Thank you very much for your helpfull link! Yes, it seems, it gets very time consuming, having 10x "i"-like and 10x "0"-like letters. It results in several million possibilities. Perhaps I have to transfer all "i"-like-letters to "i"s and so on and compare them. There are some risks to compare wrong strings, but much faster than my first idea :-( – user3731513 Feb 07 '22 at 13:59

0 Answers0