Questions tagged [penn-treebank]

The Penn Treebank Project annotates text for linguistic structure using Treebank II bracketing.

The Penn Treebank Project is located at University of Pennsylvania.

The Penn Treebank Project annotates naturally-occuring [sic] text for linguistic structure. Most notably, we produce skeletal parses showing rough syntactic and semantic information -- a bank of linguistic trees. We also annotate text with part-of-speech tags, and for the Switchboard corpus of telephone conversations, dysfluency annotation.

15 questions

votes

1 answer

how could I use complete penn treebank dataset inside python/nltk

I'm trying to learn using NLTK package in python. In particular, I need to use penn tree bank dataset in NLTK. As far as I know, If I call nltk.download('treebank') I can get the 5% of the dataset. However, I have a complete dataset in tar.gz file…

asked Mar 18 '16 at 08:21

zwlayer

1,752
1
18
41

votes

2 answers

calculating perplexity for training LSTM on penn treebank

I'm implementing language model training on penn treebank. I'm adding loss for each timestep and then calculating perplexity. This gives me non-sensically high perplexity of hundreds of billions even after training for a while. Loss itself decreases…

lstm recurrent-neural-network penn-treebank

asked Dec 29 '17 at 08:02

ytrewq

3,670
9
42
71

votes

0 answers

Determine what tree bank type can come next

I am use Apache NLP and its POSTaggerME. I have it breaking down words into their Penn Treebank tag set values. Is there any functionality out there (doesn't have to be in Apache NLP) that lets you know what kind of word can come next using the…

java nlp opennlp penn-treebank

asked Dec 14 '16 at 01:10

user489041

27,916
55
135
204

votes

0 answers

Extracting Function Tags from Parsed Sentence (using Stanford Parser)

Looking at the Penn Treebank tagset (http://web.mit.edu/6.863/www/PennTreebankTags.html#RB) there is a section called "Function Tags" that would be extremely helpful for a project I am working on. I know the Stanford Parser uses the Penn Treebank…

python nlp nltk stanford-nlp penn-treebank

asked Jun 13 '17 at 17:48

jdsto

votes

4 answers

How to reduce the number of POS tags in Penn Treebank? - NLTK (Python)

I used nltk for part of speech tagging. It has 36 Penn Treebank. I want to reduce the number of tags to 6 :"noun, verb, adjective, adverb, preposition, conjunction" How should I do so? Is there any specific function attribute? or command?

nltk pos-tagger penn-treebank

asked May 22 '17 at 16:17

user8049144

vote

1 answer

Syntactical error when yacc file is called

I am trying to build an XTAG parser from source. The relevant files can be fetched from ftp://ftp.cis.upenn.edu/pub/xtag/lem. I understand that this particular TAG parser is decades old and there are plenty of newer options, but I need this specific…

perl nlp yacc penn-treebank

asked Feb 23 '23 at 00:49

aram10

vote

0 answers

How to extract the keywords on which universal sentence encoder was trained on?

I am using Universal sentence encoder to encode some documents into a 512 dimensional embeddings. These are then used to find similar items to a search query which is also encoded using USE. USE works pretty well on general english words in search…

tensorflow nlp transformer-model sentence-similarity penn-treebank

asked Jul 13 '22 at 21:06

Pratyush

vote

1 answer

How to convert from column-based CoNLL format to the Penn Treebank annotation style?

Does anybody know about any tool, script, etc. to convert from column-based CoNLL format to the Penn Treebank annotation style?

nlp stanford-nlp penn-treebank

asked Apr 03 '17 at 13:08

Tropin

vote

1 answer

How to generate sentiment treebank in Stanford NLP

I'm using Sentiment Stanford NLP library for sentiment analytics. Now I want to generate a treebank from a sentence input sentence: "Effective but too-tepid biopic" output tree bank: (2 (3 (3 Effective) (2 but)) (1 (1 too-tepid) (2 biopic))) Can…

stanford-nlp sentiment-analysis penn-treebank

asked Mar 15 '17 at 04:42

lknguyen

vote

1 answer

Read complete penn treebank dataset from local directory

I have a complete penn treebank dataset and I want to read it using ptb from ntlk.corpus. But in here it is said that: If you have access to a full installation of the Penn Treebank, NLTK can be configured to load it as well. Download the ptb…

python nltk penn-treebank

asked Nov 23 '16 at 18:16

Wasi Ahmad

35,739
32
114
161

votes

1 answer

Part-of-Speech tagging: what is the difference between known words and unknown words?

I am trying to understand the result evaluation table (table 1) of this paper. There are three different accuracies reported overall, unknown words (UW), known words (KW), and percentage of unknown words (% unk.). Are the known words the data that…

nlp stanford-nlp part-of-speech penn-treebank

asked Nov 29 '20 at 14:10

AziZ

votes

1 answer

Hebrew Stanford NLP tag set

I am trying to find the exact list of tag set used in the Hebrew treebank used by Stanford NLP. Finding this tag set seems to be harder than finding a POS tagger :) Are there any tools for reading the tag set used for training a (Penn?) tree bank?

nlp stanford-nlp hebrew pos-tagger penn-treebank

asked Oct 08 '19 at 18:32

rubmz

1,947
5
27
49

votes

1 answer

Entities containing underscore character are split into multiple entities by TokensAnnotation in CoreNLP

I am observing that coreNLP 3.9.2 has started splitting enti_ties into multiple ones like 'enti' , '_', 'ties' while tokenizing I have tried to use the tokenize.whitespace which solves this problem. But I think this will stop splitting tokens for…

stanford-nlp tokenize penn-treebank

asked Jul 25 '19 at 13:33

Ishant Wankhede

votes

1 answer

how to learn language model?

I'm trying to train a language model with LSTM based on Penn Treebank (PTB) corpus. I was thinking that I should simply train with every bigram in the corpus so that it could predict the next word given previous words, but then it wouldn't be able…

machine-learning nlp lstm language-model penn-treebank

asked Nov 15 '17 at 00:05

ytrewq

3,670
9
42
71

-1

votes

1 answer

Finding span of each node in NLTK tree

I am new to nltk and finding it hard to deal with nltk tree. Given an nltk parsed tree from Penn treebank, I want to be able to count the span of each node recursively from bottom to up. Span of leaf nodes is 1. And the span of non terminal nodes is…

python tree nltk text-parsing penn-treebank

asked Jun 06 '17 at 09:20

JustSomeone