I have so far used the stanfordnlp library in python and I have tokenized and POS tagged a dataframe of text. I would now like to try to extract noun phrases. I have tried two different things, and I am having probles with both:
From what I can see, the stanfordnlp python library doesn't seem to offer NP chunking out of the box, at least I haven't been able to find a way to do it. I have tried making a new dataframe of all words in order with their POS tags, and then checking if nouns are repeated. However, this is very crude and quite complicated for me.
I have been able to do it with English text using nltk, so I have also tried to use the Stanford CoreNLP API in NLTK. My problem in this regard is that I need a Danish model when setting CoreNLP up with Maven (which I am very inexperienced with). For problem 1 of this text, I have been using the Danish model found here. This doesn't seem to be the kind of model I am asked to find - again, I don't exactly now what I am doing so apologies if I am misunderstanding something here.
My questions then are (1) whether it is in fact possible to do chunking of NPs in stanfordnlp in python, (2) whether I can somehow parse the POS-tagged+tokenized+lemmatized words from stanfordnlp to NLTK and do the chunking there, or (3) whether it is possible to set up CoreNLP in Danish and then use the CoreNLP api witih NLTK.
Thank you, and apologies for my lack of clarity here.