0

I recently updated PyMuPDF/fitz and so updated my code that uses it to update my use of fitz methods to match the updated naming convention (see PyMuPDF > Deprecated Names).

Problem: when I call a function I wrote to use fitz's Page.get_text() it sometimes exists as an attribute under the deprecated name and sometimes is in the updated name. Either way it is called, it is imported in the same script, but in one version second script imports the script that imports fitz (via from fitz import fitz).

How can I gain visibility into where these different sets are attributes are introduced?

Steps taken so far to disentangle and debug: I confirmed the version of PyMuPDF is the same in both situations via this StackOverflow answer with: print(pkg_resources.get_distribution('PyMuPDF').version) => 1.21.0

I checked the attributes of the object with: print(dir(page))

It is a consistent pattern based on which script I use to call my function. #1 w/ getText (deprecated): screenshot showing version number 1.21.0 and getText as an attribute

#2 w/ get_text: screenshot showing version number 1.21.0 and get_text as an attribute

Current solution:

    for page in doc:
        try:
            text += page.get_text()
        except AttributeError:
            text += page.getText()

1 Answers1

0

There are a few ways to handle situations aroung name deprecation:

  1. You can enable old names by executing fitz.restore_aliases() right after import. This will make old names available again (but keep the new ones too).

  2. You can execute a utitlity we have provided. This can be given folders as parameters (or single Python files). It will walk through the source code and replace old names with new ones. You also can request backups of your Python files. Folders will be processed recursively.

Jorj McKie
  • 2,062
  • 1
  • 13
  • 17
  • Parts of your post confused me: are you saying that some **_new_** PyMuPDF names are equal to old names, but now have a different meaning? – Jorj McKie Jan 27 '23 at 11:35
  • Sorry. Before implementing my solution I would sometimes I get this error message:"text += page.get_text() AttributeError: 'Page' object has no attribute 'get_text'" and if I switched back to the old stype I'd get this one "text += page.getText() AttributeError: 'Page' object has no attribute 'getText'". I'm just curious how the same version seems to have both sets of attributes and how I can find the source of that issue. I thought it was me overriding the variable names somehow, but the objects seem to exist in different states. – danielsgriffin Jan 27 '23 at 16:23
  • Thank you @jorj-mckie. I imagine I just have some weird cruft somehow reverting PyMuPDF objects to the deprecated version. I just want to understand why. I'll work on creating a minimal replication. If I use `fitz.restore_aliases()` and run script_b from my terminal I get: `AttributeError: module 'fitz' has no attribute 'restore_aliases'`, because it somehow sees only the old version. If I use it an run script_a (with the try/except removed) it runs while saying this: `Deprecation: 'getText' removed from class 'Page' after v1.19 - use 'get_text'.` This is my only call to fitz in my code. – danielsgriffin Jan 27 '23 at 19:37