I finally manage to understand how it works, thanks to this topic in the GoogleGroup diffusion list of BRAT
The text is sent to the Automatic Annotator API as a byte string in the body of a POST request, and the format BRAT required in response from this API is in the form of a dictionary of dictionaries, namel(
{
"T1": {
"type": "WhatEverYouWantString", # must be defined in the annotation.conf file
"offsets": [(0, 2), (10, 12)], # list of tuples of integers that correspond to the start and end position of
"texts": ["to", "go"]
}
"T2" : {
"type": "SomeString",
"offsets":[(start1, stop1), (start2, stop2), ...]
"texts":["string[start1:stop1]", "string[start2:stop2]", ...
}
"T3" : ....
}
THEN, you put this dictionary in a JSON format and you send it back to BRAT.
Note :
- "T1", "T2", ... are mandatory keys (and corresponds to the Term index in the
.ann
file that BRAT generates during manual annotation)
- the keys "type", "offsets" and "texts" are mandatory, otherwise you get some error in the log of BRAT (you can consult these log as explained in the GoogleGroup thread linked above)
- the format of the values are strict ("type" gets a string, "offsets" gets a list of tuple (or list) or integers, "texts" gets a list of strings), otherwise you get BRAT errors
I suppose that the strings in "texts" must corresponds to the "offsets", otherwise there should be an error, or at least a problem with the display of tags (this is already the case if you generate the .ann
files from an automatic detection algorithm and have different start and stop than the associated text)
I hope it helps. I managed to make the API using Flask this morning, but I needed to construct a flask.Response
object to get the correct output format. Also, the incoming format from BRAT to the Flask API could not be catch until I used a flask.request
object with request.get_body()
method.
Also, I have to mention that I was not able to use the examples given in the BRAT GitHub :
I mean I could not make them working, but I'm not familiar at all with API and HTTP packages in Python. At least I figured out what was the correct format for the API response.
Finally, I have no idea how to make relations among entities (i.e. BRAT arrows) format from the API, though
seems to work with such thing.
The GoogleGroup discussion
seems to mention that it is not possible to send relations between entities back from the Automatic Annotation API and make them work with BRAT.
I may try it later :-)