Questions tagged [conll]

Use this tag for questions concerning the CoNLL data format, e.g. for CoNLL-X or CoNLL-U data.

CoNLL stands for Conference on Computational Natural Language Learning. During a shared task of the tenth version of this conference, a data type named CoNLL-X was born. CoNLL-U is a revised version of this format that is used to structure universal dependencies.

46 questions
3
votes
1 answer

Creating a custom dataset based on CoNLL2003

I'm working on a named entity recognition (NER) project and would like to create my own dataset based on the CoNLL2003 dataset (link: https://huggingface.co/datasets/conll2003). I've been looking at the CoNLL2003 data and I'm having trouble…
Boudribila
  • 51
  • 4
3
votes
0 answers

TensorFlow: Using CRF for NER (shape-mismatch) [tensorflow_addons]

I am trying to build a Bi-LSTM CRF model for NER on CoNLL-2003 dataset I have encoded the words using char embedding and GloVe embedding, for each token I have an embedding of size 341 This is my model: def get_model(embed_size, max_seq_len,…
3
votes
1 answer

Parsing CoNLL-U files with NLTK

I know there are CoNLL-U parsers in Python. I would just like to get confirmation that NLTK does not have a native routine to parse CoNLL-U (or other CoNLL formats with dependency syntax). Looking at the code, it seems HEAD and DEP are not among the…
Chiarcos
  • 324
  • 1
  • 10
3
votes
2 answers

How to generate .conllu from a Doc object?

Where can I find an example .conllu file Spacy will accept ? or example how to generate it ? with IOB ? Trying to convert .conllu file I generated to .json for model training, this way : head_ix = token.head.i - sent[0].i + 1 conll.append(…
sten
  • 7,028
  • 9
  • 41
  • 63
3
votes
1 answer

What is the list of possible tags with a description of CoNLL 2003 NER Task?

I need to do some NER. I've found DeepPavlov library that does this. Here is an example from docs: from deeppavlov import configs, build_model ner_model = build_model(configs.ner.ner_ontonotes, download=True) ner_model(['Bob Ross lived in…
rominf
  • 2,719
  • 3
  • 21
  • 39
2
votes
0 answers

NLP in R: working with tokenization in a CONLLU-style dataframe

I am working in a Portuguese Digital Humanities project using R. I created a CONLLU-style dataframe with the corpus data, using the UDPipe library: textAnnotated <- udpipe::udpipe_annotate(m_port, x = textCorpus) %>% as.data.frame() The beginning…
2
votes
1 answer

How can I convert Conll 2003 format to json format?

I have a list of sentences with each word of a sentence being in a nested list. Such as: [['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.'], ['Peter', 'Blackburn'], ['BRUSSELS', '1996-08-22']] And also another list…
2
votes
1 answer

How to solve ValueError: [E177] Ill-formed IOB input detected: an?

I am trying to convert conll format data into spacy's json format to train a model. I am using spacy's convert for the same. I have tried this command python -m spacy convert conll_dataset.tsv /Users/user/docs -t json -c ner I am getting a…
2
votes
3 answers

Why can't I read in .conll file with Python (confusing parse-error)?

from pyconll import load_from_file data = load_from_file("filename.conll") data I'm following the documentation of pyconll to read in a .conll file, yet the following error occurs and I don't understand what it means. The dataset should be…
Paw in Data
  • 1,262
  • 2
  • 14
  • 32
2
votes
1 answer

How to change text sentence into CoNLL-U format?

I am studying dependency parsing using CoNLL-U format. I can find how to handle CoNLL-U parser or tokenlist, but I cannot find how to convert a text sentence into a CoNLL-U format. I tried converting code from…
2
votes
1 answer

Converting Spacy generated dependency into CoNLL format cannot handle more than one ROOT?

I used the SpaCy library to generate dependencies and save it into a CoNLL format using the code below. import pandas as pd import spacy df1 = pd.read_csv('cleantweets', encoding='latin1') df1['tweet'] = df1['tweet'].astype(str) tweet_list =…
KoKo
  • 349
  • 5
  • 24
1
vote
0 answers

Parsing Italian CONLLU files to remove lemmas

I am working with Italian Universal Dependency data in CONLLU format, like this: sent_id = VIT-4006 text = "grazie dell'informazione, la metterò nella memoria del mio Macintosh". 1 " " PUNCT FB _ 2 punct _ SpaceAfter=No 2 grazie …
1
vote
1 answer

Convert Prodigy JSONL / Spacy Doc format to CONLL

I have been searching for a while now but haven't found any solution to my problem. For a relation classification task I have annotated several news like text documents with prodigy annotation software. Prodigy outputs the format in a JSONL file…
Jonnyfoka
  • 11
  • 2
1
vote
3 answers

Convert spaCy `Doc` into CoNLL 2003 sample

I was planning to train a Spark NLP custom NER model, which uses the CoNLL 2003 format to do so (this blog even leaves some traning sample data to speed-up the follow-up). This "sample data" is NOT useful for me, as I have my own training data to…
David Espinosa
  • 760
  • 7
  • 21
1
vote
1 answer

Converting Spacy NER entity format to CONLL 2003 format

I am working on NER application where i have data annotated in the following data format. [('The F15 aircraft uses a lot of fuel', {'entities': [(4, 7, 'aircraft')]}), ('did you see the F16 landing?', {'entities': [(16, 19, 'aircraft')]}), ('how…
imhans33
  • 133
  • 11
1
2 3 4