I want to retrieve in the Pfam_domains column all the names mentioned at least once.
Here is my dataframe:
TCID Fonction Genbank Uniprot Pfam_domains
0 3.A.1.1.1 MalE MalE P0AEX9 PF00528
1 3.A.1.1.1 MalF MalF P02916 PF01547
2 3.A.1.1.1 MalG MalG P68183 PF00528
3 3.A.1.1.1 MalK MalK P68187 PF00005
4 3.A.1.1.1 MalK MalK P68187 PF17912
.. ... ... ... ... ...
178 3.A.1.5.32 LAC30SC_07295 LAC30SC_07295 F0TFS7 PF00528
179 3.A.1.5.32 LAC30SC_07300 LAC30SC_07300 F0TFS8 PF00528
180 3.A.1.5.32 LAC30SC_07305 LAC30SC_07305 F0TFS9 PF00005
181 3.A.1.5.32 LAC30SC_07305 LAC30SC_07305 F0TFS9 PF08352
182 3.A.1.5.32 LAC30SC_07310 LAC30SC_07310 F0TFT0 PF00005
This is my code:
for i in range(1, len(df)-1):
unite=pd.unique(df['Pfam_domains'][i])
Here, the problem is that I only list all domains (all occurrences of all domains).
Here is what I would like to have in output:
"PF00528"
"PF01547"
"PF00005"
...