I have the following data:
df = pd.DataFrame({'orig':['INOA','AFXR','GUTR','AREB'],
'dest':['AFXR','INOA','INOA','GAPR'],
'count':[100,50,1,5]})
orig dest count
INOA AFXR 100
AFXR INOA 50
GUTR INOA 1
AREB GAPR 5
For exporting to another system, I need to generate a unique integer id for all the unique values in both the orig and dest column. How the unique id is generated isn't important, as long as it's unique for this data - a unique sequence is fine. What I'd ideally end up with is a DataFrame looking like e.g.
orig_id dest_id orig dest count
1 2 INOA AFXR 100
2 1 AFXR INOA 50
3 1 GUTR INOA 1
4 5 AREB GAPR 5
So, INOA=1 AFXR=2 GUTR=3 AREB=4 and GAPR=5
How would I go on about doing this ?
I've gotten as far as I can find all the unique labels and number them:
labels = pd.DataFrame(pd.unique(df[['orig', 'dest']].values.flatten()))
labels.index += 1
Gives:
1 INOA
2 AFXR
3 GUTR
4 AREB
5 GAPR
But I'm not sure how to apply that back to create the two new orig_id
and dest_id
columns in the original dataframe - and I'm not sure this is a way to go either.