1

I am using OCR on a receipt reading app I'm building. Understandably the OCR struggles differing between an S and a 5.

My app finds each line in a restaurant receipt normally formatted like below:

 1 Champagne             £505.55
 5 Burger with chips     £25.00
 2 Chips with cheese     £5.00
 2 Coke                  £1.50
 1 Ketchup               £0.50   
 5 Penny sweets          £0.05

Currently I can find the Int and the text fine, I can also get the double at the end too but rarely if it contains a five. Is there some regex that I can put in place to determine if a 5 has been replaced by looking at its surroundings? I can only assume at the moment by recognising the currency symbol and replacing any occurrences after that? but sometimes it struggles to recognise those or there isn't one. Any suggestions or help would be great. Thanks

edit: I understand there may not be a perfect answer to get tough prices like £555.55 that appears as SSS.SS but if there is something for the more commom prices like 0.50, 10.50 or 5.00 etc id love to hear some suggestions. Thanks again

Update:

mutating func replaceWhereFivesShouldBe() {

    do {

        let regEx = try! NSRegularExpression(pattern: "\\s+[0-9S]+\\.[0-9S]{2}")
        let range = NSMakeRange(0, self.characters.count)
        self = regEx.stringByReplacingMatches(in: self, range: range, withTemplate: "5")
    } catch {

        return
    }

}
Wazza
  • 1,725
  • 2
  • 17
  • 49

1 Answers1

2

Use a regex to match any text after £ on the line and replace all "S" chars with "5". This assumes that you will only ever have non-alpha characters (specifically no S characters) after the currency symbol. This regex should work:

£[0-9S]+\.[0-9S]{2}

From there, find the index of your S chars and replace them with 5.

In the case that the currency symbol isn't present (or detected), just using the regex to identify the currency amount should work. Based on your example, I wouldn't expect to find that pattern in the item description. Something like this:

\s+[0-9S]+\.[0-9S]{2}

Or if the currency symbol is garbled, wildcard it like:

\s+.[0-9S]+\.[0-9S]{2}
Sam
  • 902
  • 12
  • 19
  • Nice, thank you, i realised if i detect each group of charcters in the strings seperated by line breaks the price will always be in the last group of each line, then i can use your second regex to confidently replace the S if it appears. Thanks! – Wazza Mar 17 '17 at 15:07
  • For any others looking the swift syntax needs \\ instead of \ – Wazza Mar 17 '17 at 15:34
  • Right - you need to escape the escape character to properly escape the period. – Sam Mar 17 '17 at 15:43
  • Hi @Sam ive just tried using your second regex example as an extension of string(see update in my question) but its not replacing any occurences of the 'S' in the string? Am i missing something? The string it receives is £505.50 which after ocr comes in as ES05.50. My output is 5.50 each time though? – Wazza Mar 17 '17 at 15:50
  • Hi Wayne, I was suggesting using the regex to identify the string you need to check. Once you have identified the S characters that need to be replaces you could replaces them as described here: http://stackoverflow.com/questions/24789515/how-to-replace-nth-character-of-a-string-with-another – Sam Mar 17 '17 at 18:17