Get unique elements from list row in pandas

Question

I have a column with annotations of sentences in IOB format. A row looks roughly like this:

data['labels'][0] = '['O', 'O', 'O', 'B-l1', 'O', 'B-l1', 'I-l2', 'I-l2', 'O', 'I-l2']'

I want to get the unique labels: 'O', 'B-l1', and 'I-l2'. The idea is to remove all rows that are not annotated, meaning the only label in the list is 'O'.

This is my current code:

list(set(data['labels][0]))

But it returns each symbol on a new row:

'O',
'B',
'-',
'l',
'1',
'I',
'2',
','

which is not what I am looking for.

I would appreciate some help here. Thanks.

`data["labels"].apply(lambda x: set(i for i in x if "-" not in i))`? — Chris, Oct 04 '22 at 10:17

mozway · Accepted Answer · 2022-10-04T10:20:39.127

1

To filter your rows, you can use set operations:

S = {'O'}

data[[not S.issuperset(l) for l in data['labels']]]

Example input:

data = pd.DataFrame({'labels': [['O'], ['O', 'B-l1'], []]})

Output:

      labels
1  [O, B-l1]

converting from strings

If you have strings representations of lists:

import ast

data['labels'] = [list(set(ast.literal_eval(l))) for l in data['labels']]

edited Oct 04 '22 at 10:20

answered Oct 04 '22 at 09:00

mozway

194,879
13
39
75

I need the unique labels per row. And for some reason, I don't get the unique labels with this code even though I copy-pasted it. – Yana Oct 04 '22 at 10:08
I thought you wanted to filter the rows. To get unique values: `data['labels'] = [list(set(l)) for l in data['labels']]`. You can then perform filtering of you want both. – mozway Oct 04 '22 at 10:11
Yes, this is what I did but the return result is: `'O', 'B', 'l', '1', '2', '-', ','` And when I get specific row it is returned as a list in single quotes. for example, `data['labels'][0]` returns the list in single quotes like this: `'['O', 'O', 'O', 'B-l1', 'O', 'B-l1', 'I-l2', 'I-l2', 'O', 'I-l2']'` – Yana Oct 04 '22 at 10:14
1

Then your real data is not what you showed. This means that you have a string. You can convert using `ast.literal_eval`. `data['labels'] = [list(set(ast.literal_eval(l))) for l in data['labels']]` – mozway Oct 04 '22 at 10:15
lifesaver :) Thank you. I will edit my question, to make it clearer for the other people. Can you add your solution to the answer above? – Yana Oct 04 '22 at 10:19

score 0 · Answer 2 · answered Oct 04 '22 at 09:23

0

Another possible solution, based on numpy.unique:

lst = ['O', 'O', 'O', 'B-l1', 'O', 'B-l1', 'I-l2', 'I-l2', 'O', 'I-l2']

np.unique(lst).tolist()

Output:

['B-l1', 'I-l2', 'O']

answered Oct 04 '22 at 09:23

PaulS

21,159
2
9
26

Get unique elements from list row in pandas

2 Answers2

converting from strings