1

After installing pytesseract package using "pip install" on google colab, i needed to install OCR trained data for other country language, however, i do not know where to copy it..

if I install package by myself using "pip install", where is the location of package on my window PC?

  • Are you using google colab or your local machine ? for installing the package? – Sundeep Pidugu Sep 17 '19 at 06:23
  • If you are looking for traineddata file then probably this will help https://stackoverflow.com/questions/55036633/how-to-create-traineddata-file-for-tesseract-4-1-0 – Sundeep Pidugu Sep 17 '19 at 06:25
  • i am using google colab on top of my window PC using chrome brower. if i use "pip install" , then this package is installed on my PC c: drive? or somewhere else on google drive? – Kiyoung Cho Sep 17 '19 at 06:53

2 Answers2

3

If you want to install Arabic for example in Google Colab

download the file

! wget https://raw.githubusercontent.com/tesseract-ocr/tessdata_best/master/ara.traineddata

Then move it to Tessdata path

! sudo mv "/content/ara.traineddata" "/usr/share/tesseract-ocr/4.00/tessdata"

Then pass the parameter to pytesseract which is lang='ara'

Full code:

image_path_in_colab = "/content/غلاف-الكتاب.jpg"
extract = pytesseract.image_to_string(Image.open(image_path_in_colab) , lang='ara')
Farid
  • 48
  • 1
  • 6
2

Installing a package on google colab will not install on your local drive which you are using, initiating a colab environment will create a remote drive where you can check out all the project files.

If you want to know for a specific pip package installation path you can always use

!pip show pytesseract-ocr

it will show you Location: of where the package is installed and then you can always add necessary files onto the package installed directory

Sundeep Pidugu
  • 2,377
  • 2
  • 21
  • 43
  • Thank you^^ it looks like a linux o/s on remote drive. i checked the o/s version using "! grep . /etc/issue*" , then i saw its version info "/etc/issue:Ubuntu 18.04.3 LTS \n \l /etc/issue.net:Ubuntu 18.04.3 LTS". – Kiyoung Cho Sep 18 '19 at 02:39
  • Please upvote and approve the answer if useful. As it will be usefull for others. – Sundeep Pidugu Sep 18 '19 at 02:40
  • Still question : after typing "! pip show pytesseract", i saw message "Location: /usr/local/lib/python3.6/dist-packages" So, i type "! cd /usr/local/lib/python3.6/dist-packages" and "pwd", then it says "/content"... i can not change directory by typing e.g.,"! cd.." and extra... how can i access directory "/usr/local/lib/python3.6/dist-packages"? thx in adv^^ – Kiyoung Cho Sep 18 '19 at 02:48
  • If you want to go to the directory, click on files(top left) -> `..(Up one level)` and it takes you to the parent directory and you can access your files there – Sundeep Pidugu Sep 18 '19 at 03:08
  • Easy method. thx^^ i also found that i have to use "% cd" instead of "! cd" on this virtual environment. – Kiyoung Cho Sep 18 '19 at 04:25