If you simply want a unique identifier for each combination of GENUS and SPECIES you can do the following:
Note: In have assumed that either GENUS or SPECIES can contain a None
value, which complicates the process slightly.
So Given a DF of the form:
GENUS SPECIES
0 Murina Coelodonta
1 Murina Microtherium
2 Microtherium Murina
3 Bachitherium Microtherium
4 Coelodonta None
5 Coelodonta Coelodonta
6 Microtherium Coelodonta
7 Microtherium Murina
8 Microtherium Bachitherium
9 Murina Microtherium
Add a column which uniquely identifies each combination of GENUS and SPECIES. We call this Column 'ID'.
Define a function to create a hash of entries, taking into account the possibility of a None
entry.
def hashValues(g, s):
if g == None:
g = "None"
if s == None:
s = 'None'
return hash(g + s)
To add the column use the following:
df['ID'] = [hashValues(df['GENUS'].to_list()[i], df['SPECIES'].to_list()[i]) for i in range(df.shape[0])]
which yields:
GENUS SPECIES ID
0 Murina Coelodonta -6583287505830614713
1 Murina Microtherium 6019734726691011903
2 Microtherium Murina -2318069015748475190
3 Bachitherium Microtherium 5795352218934423262
4 Coelodonta None 4851538573581845777
5 Coelodonta Coelodonta -5115794138222494493
6 Microtherium Coelodonta 2603682196287415014
7 Microtherium Murina -2318069015748475190
8 Microtherium Bachitherium -2746445536675711990
9 Murina Microtherium 6019734726691011903