2

I have the following data:

df = pd.DataFrame({'orig':['INOA','AFXR','GUTR','AREB'],
                   'dest':['AFXR','INOA','INOA','GAPR'],
                   'count':[100,50,1,5]})

orig    dest    count
INOA    AFXR    100
AFXR    INOA    50
GUTR    INOA    1
AREB    GAPR    5

For exporting to another system, I need to generate a unique integer id for all the unique values in both the orig and dest column. How the unique id is generated isn't important, as long as it's unique for this data - a unique sequence is fine. What I'd ideally end up with is a DataFrame looking like e.g.

orig_id dest_id orig    dest    count
1       2       INOA    AFXR    100
2       1       AFXR    INOA    50
3       1       GUTR    INOA    1
4       5       AREB    GAPR    5

So, INOA=1 AFXR=2 GUTR=3 AREB=4 and GAPR=5

How would I go on about doing this ?

I've gotten as far as I can find all the unique labels and number them:

labels = pd.DataFrame(pd.unique(df[['orig', 'dest']].values.flatten()))
labels.index += 1

Gives:

1   INOA
2   AFXR
3   GUTR
4   AREB
5   GAPR

But I'm not sure how to apply that back to create the two new orig_id and dest_id columns in the original dataframe - and I'm not sure this is a way to go either.

binary01
  • 1,728
  • 2
  • 13
  • 27

3 Answers3

3
import numpy as np
uniques = {x:i for i, x in enumerate(np.unique(np.concatenate([df.orig.unique(), df.dest.unique()])),1)}
df['orig_id'] = df.orig.map(uniques)
df['dest_id'] = df.dest.map(uniques)
df

or

data=np.unique(np.array(df[["orig", "dest"]]).flatten())
index = pd.DataFrame(range(1,len(data)+1), index=data).to_dict()[0]

df['orig_id'] = df.orig.map(index)
df['dest_id'] = df.dest.map(index)
df
MoRe
  • 2,296
  • 2
  • 3
  • 23
3

So first we extract the unique value for both columns, then extract again the unique value for these two array, then make a function that return the index of the x in unique array.

import numpy as np

unique= pd.unique(np.append(df['orig'].unique(), df['dest'].unique()))

def get_unique_id(x):
    return np.where(unique==x)[0][0]+1

df['orig_id']= df['orig'].map(get_unique_id)
df['dest_id']= df['dest'].map(get_unique_id)

Yusuf Syam
  • 701
  • 1
  • 4
  • 18
3

Stack the columns to reshape, then factorize to encode the categorical values as numbers finally unstack and join with original dataframe:

s = df[['orig', 'dest']].stack()
s[:] = s.factorize()[0] + 1

s.unstack(1).add_suffix('_id').join(df)

  orig_id dest_id  orig  dest  count
0       1       2  INOA  AFXR    100
1       2       1  AFXR  INOA     50
2       3       1  GUTR  INOA      1
3       4       5  AREB  GAPR      5
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53