how to list unique names in a column of a dataframe?

Question

I want to retrieve in the Pfam_domains column all the names mentioned at least once.

Here is my dataframe:

           TCID       Fonction        Genbank Uniprot Pfam_domains
0     3.A.1.1.1           MalE           MalE  P0AEX9      PF00528
1     3.A.1.1.1           MalF           MalF  P02916      PF01547
2     3.A.1.1.1           MalG           MalG  P68183      PF00528
3     3.A.1.1.1           MalK           MalK  P68187      PF00005
4     3.A.1.1.1           MalK           MalK  P68187      PF17912
..          ...            ...            ...     ...          ...
178  3.A.1.5.32  LAC30SC_07295  LAC30SC_07295  F0TFS7      PF00528
179  3.A.1.5.32  LAC30SC_07300  LAC30SC_07300  F0TFS8      PF00528
180  3.A.1.5.32  LAC30SC_07305  LAC30SC_07305  F0TFS9      PF00005
181  3.A.1.5.32  LAC30SC_07305  LAC30SC_07305  F0TFS9      PF08352
182  3.A.1.5.32  LAC30SC_07310  LAC30SC_07310  F0TFT0      PF00005

This is my code:

for i in range(1, len(df)-1):
    unite=pd.unique(df['Pfam_domains'][i])

Here, the problem is that I only list all domains (all occurrences of all domains).

Here is what I would like to have in output:

"PF00528"
"PF01547"
"PF00005"
...

I think you're just looking for `df.Pfam_domains.unique()`, no need to iterate over each row — sacuL, Jul 31 '20 at 17:26
actually, the loop wasn't necessary. Thank you for your help. — lmj, Jul 31 '20 at 19:39

score 1 · Answer 1 · answered Jul 31 '20 at 17:32

1

I believe this is what you're looking for.

unite = df['Pfam_domains'].unique()
unite.sort()

answered Jul 31 '20 at 17:32

rhug123

7,893
1
9
24

score 0 · Answer 2 · answered Jul 31 '20 at 17:28

start by sorting the value counts in ascending order:

df.Pfam_domains.value_counts().sort_values(ascending=False)

by definition of a dataframe, this will satisfy your request for values that are "mentioned at least once". if they are in the dataframe - they are mentioned "at least once". If you're actually looking for values that appear MORE than once, then this is also a good starting point.

how to list unique names in a column of a dataframe?

2 Answers2