replacing all row values with shorter names in pandas

Question

I have a big data set with tons of rows. I have one column in that data set with long row values. I want to rename these row values with shorter names in pandas automatically. What should I do?

My data is something like this:

and I want an output like this:

You can use similar function as previous. See my answer below. — SeaBean, Oct 09 '21 at 18:51

SeaBean · Accepted Answer · 2021-10-10T09:39:14.003

3

What you are looking for is the pd.factorize function which encodes the different patterns of objects as an enumerated type (with different serial numbers). You can use it as follows:

df['Col1'] = 'C' + pd.Series(pd.factorize(df['Col1'])[0] + 1, dtype='string')

or if your Pandas version does not support string dtype, use:

df['Col1'] = 'C' + pd.Series(pd.factorize(df['Col1'])[0] + 1).astype(str)

Demo

Data Input

data = {'Col1': ['XXXXXXXXXXXXXX', 'YYYYYYYYYYYYYY', 'XXXXXXXXXXXXXX', 'YYYYYYYYYYYYYY', 'XXXXXXXXXXXXXX', 'ZZZZZZZZZZZZZZ']}
df = pd.DataFrame(data)

print(df) 


             Col1
0  XXXXXXXXXXXXXX
1  YYYYYYYYYYYYYY
2  XXXXXXXXXXXXXX
3  YYYYYYYYYYYYYY
4  XXXXXXXXXXXXXX
5  ZZZZZZZZZZZZZZ

Output:

print(df)

  Col1
0   C1
1   C2
2   C1
3   C2
4   C1
5   C3

edited Oct 10 '21 at 09:39

answered Oct 09 '21 at 18:26

SeaBean

22,547
3
13
25

2

It is also possible to set the dtype instead of making another copy with `astype`. `df['Col1'] = 'C' + pd.Series(pd.factorize(df['Col1'])[0] + 1, dtype='string')` – Henry Ecker Oct 09 '21 at 18:59
1

I got an error with this code " the data type string is not understood" then I changed the code in this way : df['Col1'] = 'C' + pd.Series(pd.factorize(df['Col1'])[0] + 1).astype(str) and it works perfectly. Thanks @SeaBean for your help – Me0002 Oct 10 '21 at 09:36

score 0 · Answer 2 · answered Oct 09 '21 at 18:24

0

Use:

df['col1'] = 'C' + (df.groupby('Col1').ngroup() + 1).astype(str)

answered Oct 09 '21 at 18:24

Muhammad Hassan

4,079
1
13
27

replacing all row values with shorter names in pandas

2 Answers2

Demo