Python: Chunking others than noun phrases (e.g. prepositional) using Spacy, etc

Question

Since I was told Spacy was such a powerful Python module for natural speech processing, I am now desperately looking for a way to group words together to more than noun phrases, most importantly, prepositional phrases. I doubt there is a Spacy function for this but that would be the easiest way I guess (SpacySpaCy import is already implemented in my project). Nevertheless, I'm open for any possibility of phrase recognition/ chunking.

Can you give an example of what you want specifically? Maybe like an example input with the desired output corresponding to it. — Harrison, Aug 23 '16 at 12:07
Of course. As a translation of a German input, take a sentence like "How long does it take me to drive to the university?" (in German "Wie lange brauche ich bis zur Uni?"). I want "to [PREP] the [DET] university [NOUN]" to be chunked as a prepositional phase by either knowing remotely what a prepositional phrase consists of or by stating exact rules (PP -> PREP + NP) like used in other python modules. As SpaCy is used for tagging in my program and seems to only support noun chunking I would like to have a supporting module or just a function inside it to recognize additional chunks. — Malte Ge, Aug 23 '16 at 13:25

Emiel · Answer 1 · 2020-02-22T10:55:55.300

9

Here's a solution to get PPs. In general you can get phrases using subtree.

def get_pps(doc):
    "Function to get PPs from a parsed document."
    pps = []
    for token in doc:
        # Try this with other parts of speech for different subtrees.
        if token.pos_ == 'ADP':
            pp = ' '.join([tok.orth_ for tok in token.subtree])
            pps.append(pp)
    return pps

Usage:

import spacy

nlp = spacy.load('en_core_web_sm')
ex = 'A short man in blue jeans is working in the kitchen.'
doc = nlp(ex)

print(get_pps(doc))

This prints:

['in blue jeans', 'in the kitchen']

edited Feb 22 '20 at 10:55

answered Oct 29 '17 at 11:25

Emiel

343
6
14

Where's the `nlp()` function from? – Hamman Samuel Aug 08 '18 at 03:25
1

I've updated the answer. `nlp` refers to a loaded SpaCy instance (following the convention from the SpaCy docs: https://spacy.io/usage/). – Emiel Aug 09 '18 at 07:23
Thanks, I ran into another issue with `spacy.load('en')`, which was fixed by replacing it with `spacy.load('en_core_web_sm')`, solution is from spaCy's GitHub issue tracker discussion https://github.com/explosion/spaCy/issues/1721#issuecomment-373241198 – Hamman Samuel Aug 09 '18 at 15:43
Hey, I was wondering if anyone knows how to apply this to a df? – JassiL Feb 20 '20 at 16:01
Generally speaking, you can create a new column based on values from another column using the `apply` method. For example: `df['b'] = df['a'].apply(len)` will create a new column (with label `'b'`) based on the values in the column with label `'a'`, using the built-in `len` function. In other words: the second column will hold the lengths of the items in column `'a'`. You can use any function you like, including the one in the answer. But if the column holds strings, then you do need to process the strings first. – Emiel Feb 22 '20 at 10:52

Python: Chunking others than noun phrases (e.g. prepositional) using Spacy, etc

1 Answers1

Linked