Since I was told Spacy was such a powerful Python module for natural speech processing, I am now desperately looking for a way to group words together to more than noun phrases, most importantly, prepositional phrases. I doubt there is a Spacy function for this but that would be the easiest way I guess (SpacySpaCy import is already implemented in my project). Nevertheless, I'm open for any possibility of phrase recognition/ chunking.
Asked
Active
Viewed 4,003 times
8
-
Can you give an example of what you want specifically? Maybe like an example input with the desired output corresponding to it. – Harrison Aug 23 '16 at 12:07
-
1Of course. As a translation of a German input, take a sentence like "How long does it take me to drive to the university?" (in German "Wie lange brauche ich bis zur Uni?"). I want "to [PREP] the [DET] university [NOUN]" to be chunked as a prepositional phase by either knowing remotely what a prepositional phrase consists of or by stating exact rules (PP -> PREP + NP) like used in other python modules. As SpaCy is used for tagging in my program and seems to only support noun chunking I would like to have a supporting module or just a function inside it to recognize additional chunks. – Malte Ge Aug 23 '16 at 13:25
1 Answers
9
Here's a solution to get PPs. In general you can get phrases using subtree
.
def get_pps(doc):
"Function to get PPs from a parsed document."
pps = []
for token in doc:
# Try this with other parts of speech for different subtrees.
if token.pos_ == 'ADP':
pp = ' '.join([tok.orth_ for tok in token.subtree])
pps.append(pp)
return pps
Usage:
import spacy
nlp = spacy.load('en_core_web_sm')
ex = 'A short man in blue jeans is working in the kitchen.'
doc = nlp(ex)
print(get_pps(doc))
This prints:
['in blue jeans', 'in the kitchen']

Emiel
- 343
- 6
- 14
-
-
1I've updated the answer. `nlp` refers to a loaded SpaCy instance (following the convention from the SpaCy docs: https://spacy.io/usage/). – Emiel Aug 09 '18 at 07:23
-
Thanks, I ran into another issue with `spacy.load('en')`, which was fixed by replacing it with `spacy.load('en_core_web_sm')`, solution is from spaCy's GitHub issue tracker discussion https://github.com/explosion/spaCy/issues/1721#issuecomment-373241198 – Hamman Samuel Aug 09 '18 at 15:43
-
-
Generally speaking, you can create a new column based on values from another column using the `apply` method. For example: `df['b'] = df['a'].apply(len)` will create a new column (with label `'b'`) based on the values in the column with label `'a'`, using the built-in `len` function. In other words: the second column will hold the lengths of the items in column `'a'`. You can use any function you like, including the one in the answer. But if the column holds strings, then you do need to process the strings first. – Emiel Feb 22 '20 at 10:52