Questions tagged [text-chunking]
31 questions
12
votes
4 answers
Python (NLTK) - more efficient way to extract noun phrases?
I've got a machine learning task involving a large amount of text data. I want to identify, and extract, noun-phrases in the training text so I can use them for feature construction later on in the pipeline.
I've extracted the type of noun-phrases…

Silent-J
- 322
- 1
- 4
- 15
11
votes
1 answer
How to use nltk regex pattern to extract a specific phrase chunk?
I have written the following regex to tag certain phrases pattern
pattern = """
P2: {+ ? * + * *}
P1: {? + ? * ? * +}
P3: {}
P4: {}
…

pd176
- 821
- 3
- 10
- 20
8
votes
3 answers
How to extract chunks from BIO chunked sentences? - python
Give an input sentence, that has BIO chunk tags:
[('What', 'B-NP'), ('is', 'B-VP'), ('the', 'B-NP'), ('airspeed',
'I-NP'), ('of', 'B-PP'), ('an', 'B-NP'), ('unladen', 'I-NP'),
('swallow', 'I-NP'), ('?', 'O')]
I would need to extract the…

alvas
- 115,346
- 109
- 446
- 738
4
votes
1 answer
Chunk a colon in NLTK
I am trying to split a chunk at the position of a colon : in NLTK but it seems its a special case. In normal regex I can just put it in [:] no problems.
But in NLTK no matter what I do it does not like it in the regexParser.
from nltk import …

yaroze
- 41
- 2
3
votes
1 answer
How to train Chunker in Opennlp?
I need to train the Chunker in Opennlp to classify the training data as a noun phrase. How do I proceed? The documentation online does not have an explanation how to do it without the command line, incorporated in a program. It says to use…

zoozoofreak
- 65
- 1
- 11
3
votes
1 answer
NLTK RegEx Chunker not capturing defined grammar patterns with wildcards
I am trying to chunk a sentence using NLTK's POS tags as regular expressions. 2 rules are defined to identify phrases, based on the tags of words in the sentence.
Mainly, I wanted to capture the chunk of one or more verbs followed by an optional…

Bala
- 193
- 1
- 9
2
votes
1 answer
NLTK Regex Chunker Not Processing multiple Grammar Rules in one command
I am trying to extract phrases from my corpus for this i have defined two rules one is noun followed by multiple nouns and other is adjective followed by noun, here i want that if same phrase is extracted from both rules the program should ignore…

user3778289
- 323
- 4
- 18
2
votes
3 answers
Not condition in NLTK Regex Parser
I need to create a not condition as part of my grammar in NLTK's regex parser. I would like to chunk those words which are of structure 'Coffee & Tea' but it should not chunk if there is a word of type before the sequence. For example 'in…

Ram G Athreya
- 4,892
- 6
- 25
- 57
2
votes
1 answer
Training IOB Chunker using nltk.tag.brill_trainer (Transformation-Based Learning)
I'm trying to train a specific chunker (let's say a noun chunker for simplicity) by using NLTK's brill module. I'd like to use three features, ie. word, POS-tag, IOB-tag.
(Ramshaw and Marcus, 1995:7) have shown 100 templates which are generated…

user2870222
- 269
- 1
- 3
- 13
2
votes
0 answers
Use Completion Suggester to match against all ngrams in a query
I'd like to know if it's possible to use Elasticsearch's Completion Suggester to match against all ngrams in a query.
What I basically want to do is 'misuse' Completion Suggester to do "Dictionary based chunking".
For example given the sentence:…

Geert-Jan
- 18,623
- 16
- 75
- 137
1
vote
2 answers
RecursiveCharacterTextSplitter of Langchain doesn't exist
I am trying to do a text chunking by LangChain's RecursiveCharacterTextSplitter model. I have install langchain(pip install langchain[all]), but the program still report there is no RecursiveCharacterTextSplitter package. I use from…

Zhenyu Wang
- 7
- 1
1
vote
1 answer
parsing a sentence - match inflections and skip punctuation
I'm trying to parse sentences in python- for any sentence I get I should take only the words that appear after the words 'say' or 'ask' (if the words doesn't appear, I should take to whole sentence)
I simply did it with regular expressions:
sen =…

merav
- 33
- 4
1
vote
1 answer
Constituent tree in Python (NLTK)
I have found this code here:
# Import required libraries
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk import pos_tag, word_tokenize, RegexpParser
# Example text
sample_text = "The quick brown fox…

DanielTheRocketMan
- 3,199
- 5
- 36
- 65
1
vote
2 answers
Conditional chunking of text file in Python
Hopefully this is a pretty straight-forward question. I have a transcript that i am trying to split into chunks of each speaker. The code I currently have is;
text = '''
Speaker 1: hello there
this is some text.
Speaker 2: hello there,
this is…

cookie1986
- 865
- 12
- 27
1
vote
2 answers
Parse NLTK tree output in a list of noun phrase
I have a sentence
text = '''If you're in construction or need to pass fire inspection, or just want fire resistant materials for peace of mind, this is the one to use. Check out 3rd party sellers as well Skylite'''
I applied NLTK chunking on it…

SpottedLeo
- 33
- 6