I have a column with annotations of sentences in IOB format. A row looks roughly like this:
data['labels'][0] = '['O', 'O', 'O', 'B-l1', 'O', 'B-l1', 'I-l2', 'I-l2', 'O', 'I-l2']'
I want to get the unique labels: 'O'
, 'B-l1'
, and 'I-l2'
. The idea is to remove all rows that are not annotated, meaning the only label in the list is 'O'
.
This is my current code:
list(set(data['labels][0]))
But it returns each symbol on a new row:
'O',
'B',
'-',
'l',
'1',
'I',
'2',
','
which is not what I am looking for.
I would appreciate some help here. Thanks.