I have a dataset where each row is a sample, and a column (name "Sample_ID") names each sample (df1 below). Some samples are repeated multiple times (i.e. have identical values for "Sample_ID"). I would like to generate a new column with different names for each sample (I'll call it "Sample_code") based on a simple ascending pattern (e.g. SAMP001, SAMP002, SAMP003 etc) from the first row to the last row in the table. But rows with identical Sample_IDs need to have identical Sample_code values as well (so I can't simply generate an ascending set of sample names for the new column).
In the example data below, df1 represents my starting data. df2 is what I want to end up with: the Sample_code column values ascend as you go down each row, but with the same value for the rows where Sample_ID is duplicated.
I'm quite puzzled where to start so any help would be much appreciated, thank you.
import numpy as np
import pandas as pd
# df1
data1 = {'Sample_ID': ['123123','123456','123123','123789','456789', '123654'],
'Variable_A': [15,12,7,19,3,12],
'Variable_B':["blue","red","red","blue","blue", "red"]}
df1 = pd.DataFrame(data1)
# df2
data2 = {'Sample_ID': ['123123','123456','123123','123789','456789', '123654'],
'Sample_code' : ['SAMP001', 'SAMP002', 'SAMP001', 'SAMP003', 'SAMP004', 'SAMP005'],
'Variable_A': [15,12,7,19,3,12],
'Variable_B':["blue","red","red","blue","blue", "red"]}
df2 = pd.DataFrame(data2)
df1
df2
EDIT Ideally I would like to have the ascending Sample_code names be in the original order of the rows, as the rows in the starting dataset are ordered by date of collection. I'd like the Sample_code names to be based on the first time a particular sample appears as you go down the rows. A new illustrative df3 has the date column to give a sense of what I mean.
# df3
data3 = {'Sample_ID': ['123123','123456','123123','123789','456789',
'123654', '123123', '123789'],
'Date' : ['15/06/2019', '23/06/2019', '30/06/2019', '07/07/2019',
'15/07/2019', '31/07/2019', '12/08/2019', '27/08/2019'],
'Variable_A': [15,12,7,19,3,12,7,9],
'Variable_B':["blue","red","red","blue","blue", "red","blue", "red"]}
df3 = pd.DataFrame(data3)
df3
The solution suggested below works, but it creates Sample_code names based on the final row in which the repeated Sample_ID values appear, e.g. Sample_ID "123123" is labelled "SAMP006" (for the final row this value appears), but I'd like this one to be "SAMP001" (the first row in which it appears).
lookup = {}
for i, sample_name in enumerate(df3.Sample_ID):
lookup[sample_name] = f'SAMP{i:03}'
df3['Sample_code'] = df3.Sample_ID.apply(lambda x: lookup[x])
df3