0

I have programmed a webapp which lets me take a photo and then converts the photographed business card into text using Tesseract.js. But I only need the email and the name (mostly separated by paragraphs) from the text. Maybe someone knows an API that extracts such data for me. Or a RegEx definition that helps me a lot here ? ( Knowledge about RegEx is very limited with me) Many thanks in advance for your answers cheers

I have already tried to do it on my own but i see no ending in failing...

  • You might be able to use a regex to grab the email (you can look for the @ symbol), but I think the correct solution here would be to train an ML model to determine what the name and email are. Since different business cards have this information in different spots/orders, I don't think a simple grep or string split will work here. – katamaster818 Oct 04 '19 at 15:08
  • For the email you can read some info here: https://stackoverflow.com/questions/46155/how-to-validate-an-email-address-in-javascript – muka.gergely Oct 04 '19 at 15:30
  • hello there, Thank you for your quick response! The accuracy may be 80%. So that wouldn't be a problem, there is also a form available to correct data if necessary. I have thought about getting myself a list with the most known names of the world and let the first name run through there and check whether it occurs there. – TheSmallest Oct 04 '19 at 15:32

0 Answers0