0

since yesterday im trying to use the OCR pytesser. I solved few problems by myself but i c'ant figure out how to get ride of this one. there is the error :

H:\Python27>python.exe lol.py
Traceback (most recent call last):
File "lol.py", line 30, in <module>
print image_to_string(image)
File "H:\Python27\lib\pytesser\__init__.py", line 30, in image_to_string
call_tesseract(scratch_image_name, scratch_text_name_root)
File "H:\Python27\lib\pytesser\__init__.py", line 20, in call_tesseract
proc = subprocess.Popen(args)
File "H:\Python27\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "H:\Python27\lib\subprocess.py", line 958, in _execute_child
 startupinfo)
WindowsError: [Error 2] Le fichier spÚcifiÚ est introuvable

the last line say "the file cant be found"

there is how i put the tesseract in my init.py

tesseract_exe_name = 'C:\Users\TyLo\AppData\Local\Tesseract-OCR\tesseract' # Name of executable to be called at command line

i really cant figure out why he cant open the file. there is 2 other things also, in my init.py. I can change the image file and the txt file i tried to create mine and give him the path no sucess, but i think he create them himself.

scratch_image_name = "outfile.bmp" # This file must be .bmp or other Tesseract-compatible format
scratch_text_name_root = "infile" # Leave out the .txt extension

this is the 3 files that are sent to Popen so i imagine the error is there.

I hope im clear enough for you guys to understand the problem i have.

edit: the in lol.py is from this site, just modified the url http://www.debasish.in/2012/01/bypass-captcha-using-python-and.html

TyLo
  • 19
  • 3
  • I'm pretty sure this must be a dup, but it's a hard thing to search for… any questioner that knew enough to use the right terms in his question would know enough to not have the problem… – abarnert May 05 '15 at 19:14
  • [This one](http://stackoverflow.com/questions/28706163/python-3-4-1-script-syntax-error-arcpy/28706216#28706216) has the same ultimate problem, and a good answer from Martijn Pieters, but I don't think it'll make sense to a novice that they're related… – abarnert May 05 '15 at 19:18

1 Answers1

3

This is the problem:

tesseract_exe_name = 'C:\Users\TyLo\AppData\Local\Tesseract-OCR\tesseract' # Name of executable to be called at command line

See the \t there? That's a single tab character, not a backslash character and a t character. And you only get away with \U, \T, \A, \L, and \T because you got lucky and nobody had thought of a use for them yet by the time your version of Python came out. (Later versions of Python do actually have a use for \U.)

The solution is to do one of the following:

(1) Use a raw string literal

tesseract_exe_name = r'C:\Users\TyLo\AppData\Local\Tesseract-OCR\tesseract' # Name of executable to be called at command line

The r'…' means "don't treat backslashes specially".

(2) Escape all of your backslashes:

tesseract_exe_name = 'C:\\Users\\TyLo\\AppData\\Local\\Tesseract-OCR\\tesseract' # Name of executable to be called at command line

In a non-raw string literal, \\ means a single backslash, so \\t means a single backslash and a t.

(3) Use forward slashes instead:

tesseract_exe_name = 'C:/Users/TyLo/AppData/Local/Tesseract-OCR/tesseract' # Name of executable to be called at command line

Most Windows programs accept forward slashes. A few don't, and occasionally you need a \\.\ pathname that isn't legal with forward slashes, but otherwise, this works.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    great answer but the double backslashes are missing from #2. Maybe quad escapes in the markup? – tdelaney May 05 '15 at 19:14
  • @tdelaney: For some reason, it's not showing those code blocks as code blocks. Maybe having them inside a numbered block is confusing it? I'll try to fix it… – abarnert May 05 '15 at 19:15
  • @tdelaney: I couldn't figure it out, so I just faked the numbered list, and now it works. :) – abarnert May 05 '15 at 19:16
  • oh i did not thought of that, thank you man! that was the solution :) to my problem :), but it don't print me the captcha i sent him as a string that's weird. – TyLo May 05 '15 at 19:20
  • @TyLo: If you have multiple separate problems, even if they're in the same program, create a separate question for each. (You can use the `share` button to get a link to this question to paste into your other question so you can only repeat the important stuff and leave all the background out.) – abarnert May 05 '15 at 19:21
  • 1
    ok i understand, i will try to fix it myself first :) thank you again for your help. – TyLo May 05 '15 at 19:24
  • @TyLo: Sure, trying to fix it yourself is ever better than just opening another question immediately. :) – abarnert May 05 '15 at 19:25