I have been searching for a while now but haven't found any solution to my problem. For a relation classification task I have annotated several news like text documents with prodigy annotation software. Prodigy outputs the format in a JSONL file that can be converted into a .spacy file. In the JSONL format, each line represents one news article with its annotations.
Now I want to convert my annotations into a more standardized format like CONLL, so that I can work with my annotations with other open source software like Inception (Unfortunatly Prodigy has not been a good choice). Unfortunatly, I haven't found any lib, script or tool that can convert prodigy Jsonl/Spacy to CONLL.
Here is an example, how the prodigy JSONL format looks like:
{
"text": "My mother’s name is Sasha Smith. She likes dogs and pedigree cats.",
"tokens": [
{"text": "My", "start": 0, "end": 2, "id": 0, "ws": true},
{"text": "mother", "start": 3, "end": 9, "id": 1, "ws": false},
{"text": "’s", "start": 9, "end": 11, "id": 2, "ws": true},
{"text": "name", "start": 12, "end": 16, "id": 3, "ws": true },
{"text": "is", "start": 17, "end": 19, "id": 4, "ws": true },
{"text": "Sasha", "start": 20, "end": 25, "id": 5, "ws": true},
{"text": "Smith", "start": 26, "end": 31, "id": 6, "ws": true},
{"text": ".", "start": 31, "end": 32, "id": 7, "ws": true, "disabled": true},
{"text": "She", "start": 33, "end": 36, "id": 8, "ws": true},
{"text": "likes", "start": 37, "end": 42, "id": 9, "ws": true},
{"text": "dogs", "start": 43, "end": 47, "id": 10, "ws": true},
{"text": "and", "start": 48, "end": 51, "id": 11, "ws": true, "disabled": true},
{"text": "pedigree", "start": 52, "end": 60, "id": 12, "ws": true},
{"text": "cats", "start": 61, "end": 65, "id": 13, "ws": true},
{"text": ".", "start": 65, "end": 66, "id": 14, "ws": false, "disabled": true}
],
"spans": [
{"start": 20, "end": 31, "token_start": 5, "token_end": 6, "label": "PERSON"},
{"start": 43, "end": 47, "token_start": 10, "token_end": 10, "label": "NP"},
{"start": 52, "end": 65, "token_start": 12, "token_end": 13, "label": "NP"}
],
"relations": [
{
"head": 0,
"child": 1,
"label": "POSS",
"head_span": {"start": 0, "end": 2, "token_start": 0, "token_end": 0, "label": null},
"child_span": {"start": 3, "end": 9, "token_start": 1, "token_end": 1, "label": null}
},
{
"head": 1,
"child": 8,
"label": "COREF",
"head_span": {"start": 3, "end": 9, "token_start": 1, "token_end": 1, "label": null},
"child_span": {"start": 33, "end": 36, "token_start": 8, "token_end": 8, "label": null}
},
{
"head": 9,
"child": 13,
"label": "OBJECT",
"head_span": {"start": 37, "end": 42, "token_start": 9, "token_end": 9, "label": null},
"child_span": {"start": 52, "end": 65, "token_start": 12, "token_end": 13, "label": "NP"}
}
]
}
Thanks in advance
I want to to convert either the prodigy jsonl into CONLL or the .spacy annotation file into conll