-1

I have the following dataframe.

df = pd.DataFrame({'A':['abc1@abc.com','abc2@abc.com','abc3@abc.com','abc4@abc.com','abc2@abc.com','abc3@abc.com'],
                   'B':[4,5,4,5,5,4],
                   })

I need to generate rollnumber for column A in the format

"string+!--10digitnumberstaringfrom1--+string"

If the values are repeated roll number should be unique.

Expected Output:

              A     B  RollNumber
0   abc1@abc.com    4  ABC000000001AB
1   abc2@abc.com    5  ABC000000002AB
2   abc3@abc.com    4  ABC000000003AB
3   abc4@abc.com    5  ABC000000004AB
4   abc2@abc.com    5  ABC000000002AB
5   abc3@abc.com    4  ABC000000003AB
Yog
  • 817
  • 1
  • 10
  • 19

1 Answers1

2

Use list comprehension with zero fill:

#python 3.6+
df['RollNumber'] = [f'ABC{x:010}AB' for x in range(1, len(df) + 1)]
#python 3
#df['RollNumber'] = ['ABC{0:010d}AB'.format(x) for x in range(1, len(df) + 1)]
print (df)

              A  B       RollNumber
0  abc1@abc.com  4  ABC0000000001AB
1  abc2@abc.com  5  ABC0000000002AB
2  abc3@abc.com  4  ABC0000000003AB
3  abc4@abc.com  5  ABC0000000004AB
4   abc2@bc.com  5  ABC0000000005AB
5   abc3@bc.com  4  ABC0000000006AB

EDIT: For same values per column A need factorize with Series.str.zfill:

s = pd.Series(pd.factorize(df['A'])[0] + 1).astype(str).str.zfill(10)
df['RollNumber'] = ('ABC' + s + 'AB')
print (df)
              A  B       RollNumber
0  abc1@abc.com  4  ABC0000000001AB
1  abc2@abc.com  5  ABC0000000002AB
2  abc3@abc.com  4  ABC0000000003AB
3  abc4@abc.com  5  ABC0000000004AB
4  abc2@abc.com  5  ABC0000000002AB
5  abc3@abc.com  4  ABC0000000003AB
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    @what if I have repeated values check the expected output now – Yog Sep 24 '18 at 06:47
  • how to use df['RollNumber'] = [f'ABC{x:010}AB' for x in range(1, len(df) + 1)] this code to print from start roll number from 20 – Yog Sep 25 '18 at 12:00
  • 1
    @Yog - Use `df['RollNumber'] = [f'ABC{x:010}AB' for x in range(20, len(df) + 21)]` – jezrael Sep 25 '18 at 12:01
  • geeting this error ValueError: Length of values does not match length of index – Yog Sep 25 '18 at 12:02
  • 1
    @Yog - ooops, not tested - need `df['RollNumber'] = [f'ABC{x:010}AB' for x in range(20, len(df) + 20)] ` – jezrael Sep 25 '18 at 12:04
  • what if df['A'] is not in ascending order ? like `[abc1,abc2,abc4,abc2]` – Pyd Oct 01 '18 at 09:30