5

When doing pip install paddleocr, I am facing an error in building wheel for PyMuPDF.

Building wheels for collected packages: PyMuPDF
Building wheel for PyMuPDF (setup.py) ... error
error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [70 lines of output]



Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\3551\AppData\Local\Temp\pip-install-ip72hta1\pymupdf_f7a2c6bc313a492fa6c66ad0817a4616\setup.py", line 487, in <module>
          mupdf_local = get_mupdf()
                        ^^^^^^^^^^^
        File "C:\Users\3551\AppData\Local\Temp\pip-install-ip72hta1\pymupdf_f7a2c6bc313a492fa6c66ad0817a4616\setup.py", line 450, in get_mupdf
          return tar_extract( mupdf_tgz, exists='return')
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\3551\AppData\Local\Temp\pip-install-ip72hta1\pymupdf_f7a2c6bc313a492fa6c66ad0817a4616\setup.py", line 183, in tar_extract
          t.extractall()
        File "C:\Users\3551\AppData\Local\Programs\Python\Python311\Lib\tarfile.py", line 2059, in extractall
          self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
        File "C:\Users\3551\AppData\Local\Programs\Python\Python311\Lib\tarfile.py", line 2100, in extract
          self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
        File "C:\Users\3551\AppData\Local\Programs\Python\Python311\Lib\tarfile.py", line 2173, in _extract_member
          self.makefile(tarinfo, targetpath)
        File "C:\Users\3551\AppData\Local\Programs\Python\Python311\Lib\tarfile.py", line 2214, in makefile
          with bltn_open(targetpath, "wb") as target:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      FileNotFoundError: [Errno 2] No such file or directory: '.\\mupdf-1.20.3-source\\thirdparty\\harfbuzz\\test\\shaping\\texts\\in-house\\shaper-indic\\script-devanagari\\utrrs\\codepoint\\IndicFontFeatureCodepoint-AdditionalConsonants.txt'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for PyMuPDF
  Running setup.py clean for PyMuPDF
Failed to build PyMuPDF
ERROR: Could not build wheels for PyMuPDF, which is required to install pyproject.toml-based projects

I tried doing pip install wheel and installing the PyMuPDF using pip by pip install PyMuPDF then install paddleocr by pip install paddleocr but the same problem is there error building wheel file for PyMuPDF.

I am using a intel i3, 64 bit processor and python version is 3.11.3

Jinen Rathore
  • 65
  • 1
  • 6
  • 1
    Please don't post images of text. Copy/paste the complete error message into your question, using the [edit] feature. Then put it in code formatting by highlighting it and clicking the `{}` button. this way, they are much more readable and accessible. You will only increase the number of people that actually read your question through – FlyingTeller Jun 01 '23 at 07:11
  • Also include information about your OS and python setup – FlyingTeller Jun 01 '23 at 07:12
  • Thank you for your suggestion I updated my question as per it – Jinen Rathore Jun 01 '23 at 07:25
  • Are you fixed on using python 3.11 or would you be willing to switch to a lower version? – FlyingTeller Jun 01 '23 at 07:28
  • `paddleocr` has the requirement `PyMuPDF<1.21.0` and `PyMuPDF==1.20.2` (the latest version that fits the `paddleocr` requirement) only has whl files up to python 3.10 – FlyingTeller Jun 01 '23 at 07:34
  • If it is absolutely necessary I can switch to lower version of python, else I would like to stick to 3.11 so to make it compatible I need to switch back to python 3.10. Thank you very much – Jinen Rathore Jun 01 '23 at 07:35
  • can you check which version your system is trying to install, there should be a line in your output that says something about `pymupdfXXX.tar.gz` where XXX is the version number? – FlyingTeller Jun 01 '23 at 07:47
  • Try to download https://mupdf.com/downloads/archive/mupdf-1.22.0-source.tar.gz extract it and then set the environment variable `PYMUPDF_SETUP_MUPDF_BUILD` to the path of the extracted `mupdf-1.22.0` location. Then try the installation again – FlyingTeller Jun 01 '23 at 08:13
  • it is trying to install `pymupdf 1.20.2 .tar.gz` – Jinen Rathore Jun 01 '23 at 08:38
  • Should download https://mupdf.com/downloads/archive/mupdf-1.20.3-source.tar.gz and setup the environment variable as described above – FlyingTeller Jun 01 '23 at 08:40
  • Or try to download and install the whl file from https://drive.google.com/drive/folders/1PESjDkovpvnrWFTKji4-qgT3rcVz-o-F?usp=sharing and then do `pip install `. it should install `pymupdf`, then you can `pip install paddleocr` – FlyingTeller Jun 01 '23 at 08:55
  • Downloading the wheel file and install worked for me thank you very much!! – Jinen Rathore Jun 01 '23 at 09:06
  • @FlyingTeller yours is the answer I needed. Please write it as an answer, not just a comment. Thanks! – Esraa Abdelmaksoud Jun 04 '23 at 19:34
  • @EsraaAbdelmaksoud done – FlyingTeller Jun 05 '23 at 06:03

2 Answers2

3

paddleocr has the requirement PyMuPDF<1.21.0 and PyMuPDF==1.20.2 (the latest version that fits the paddleocr requirement) only has whl files up to python 3.10. Therefor, pip falls back to trying to install from source.

The exact error message is from the install script of PyMuPDF trying to download one of its dependencies, which fails during extraction of the .tar.gz file. You have different options now:

  1. Manually download https://mupdf.com/downloads/archive/mupdf-1.20.3-source.tar.gz then extract the archive to a path of your choosing. Set the environment variable PYMUPDF_SETUP_MUPDF_BUILD to the path of the extracted mupdf-1.20.3 folder and try to run pip install PyMuPDF==1.20.2. Note that you will also need a working compiler for this approach

  2. Download this unofficial whl file: https://drive.google.com/drive/folders/1PESjDkovpvnrWFTKji4-qgT3rcVz-o-F?usp=sharing and install it with pip install <path to the whl file>

FlyingTeller
  • 17,638
  • 3
  • 38
  • 53
0

For the developers who are facing this issue on macOS, you need to install pip install PyMuPDF==1.20.0 as PaddleOCR requires PyMuPDF<1.21.0.

If you are still facing the ERROR: Failed building wheel for PyMuPDF issue then try brew install swig first and then try pip install PyMuPDF==1.20.0 and it will work.