2

I installed spaCy 3.0 on my ubuntu. I use ctrl+B to find the definition of class "sentencizer" which is in sentencizer.py file:

class Sentencizer(__spacy_pipeline_pipe.Pipe):
    """
    Segment the Doc into sentences using a rule-based strategy.
    
        DOCS: https://spacy.io/api/sentencizer
    """
    def from_bytes(self, bytes_data, *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__ 
        """
        Sentencizer.from_bytes(self, bytes_data, *, exclude=tuple())
        Load the sentencizer from a bytestring.
        
                bytes_data (bytes): The data to load.
                returns (Sentencizer): The loaded object.
        
                DOCS: https://spacy.io/api/sentencizer#from_bytes
        """
        pass
...

Why there is no content in the functions defined in sentencizer.py. In the spaCy github repo, there is no sentencizer.py file and the class "sentencizer" is defined in sentencizer.pyx:

class Sentencizer(Pipe):
    """Segment the Doc into sentences using a rule-based strategy.
    DOCS: https://spacy.io/api/sentencizer
    """

    default_punct_chars = ['!', '.', '?', '։', '؟', '۔', '܀', '܁', '܂', '߹',
            '।', '॥', '၊', '။', '።', '፧', '፨', '᙮', '᜵', '᜶', '᠃', '᠉', '᥄',
            '᥅', '᪨', '᪩', '᪪', '᪫', '᭚', '᭛', '᭞', '᭟', '᰻', '᰼', '᱾', '᱿',
            '‼', '‽', '⁇', '⁈', '⁉', '⸮', '⸼', '꓿', '꘎', '꘏', '꛳', '꛷', '꡶',
            '꡷', '꣎', '꣏', '꤯', '꧈', '꧉', '꩝', '꩞', '꩟', '꫰', '꫱', '꯫', '﹒',
            '﹖', '﹗', '!', '.', '?', '', '', '', '', '', '', '',
            '', '', '', '', '', '', '', '', '', '', '', '', '',
            '', '', '', '', '', '', '', '', '', '', '', '', '',
            '', '', '', '', '', '', '', '', '', '', '', '', '',
            '', '', '', '', '', '', '', '', '', '', '', '', '',
            '。', '。']

Why the installed files are different from the github repo? Thanks!

zpeng
  • 35
  • 3
  • When you say you use ctrl+b to find the definition, what software are you using for that? Also, what's your full spaCy version - 3.0.3 or an earlier one? – polm23 Mar 06 '21 at 10:26

1 Answers1

0

When developing a Python library, small changes are saved in Git as they're made, but they're only released to PyPI when the maintainer intentionally makes a release. So it's normal for the files on your computer to be a little different from the files in a git repo, even if you have a very recent release.

I was really confused about the sentencizer.py code you posted, since there doesn't seem to have ever been a file with that name in spaCy, but it looks like that is a magic PyCharm feature - it's not showing you the actual source code, it's doing some sort of decompilation.

You noticed spaCy has the sentencizer.pyx file. That's compiled into a binary .so file that Python runs when you use the code. PyCharm is presumably working backwards from the .so file.

polm23
  • 14,456
  • 7
  • 35
  • 59