0

I have this sample data in a cell:

EmployeeID

2016-CT-1028
2016-CT-1028
2017-CT-1063
2017-CT-1063
2015-CT-948
2015-CT-948

So, my problem is how can I add 0 inside this data 2015-CT-948 to make it like this 2015-CT-0948. I tried this code:

pattern = re.compile(r'(\d\d+)-(\w\w)-(\d\d\d)')
newlist = list(filter(pattern.match, idList))

Just to get the match regex pattern then add the 0 with zfill() but its not working. Please, can someone give me an idea on how can I do it. Is there anyway I can do it in regex or in pandas. Thank you!

N.Omugs
  • 321
  • 4
  • 17
  • 1
    Possible duplicate of [Display number with leading zeros](https://stackoverflow.com/questions/134934/display-number-with-leading-zeros) – Georgy Oct 16 '18 at 09:13

5 Answers5

4

This is one approach using zfill

Ex:

import pandas as pd

def custZfill(val):
    val = val.split("-")
    #alternative split by last -
    #val = val.rsplit("-",1)
    val[-1] = val[-1].zfill(4)
    return "-".join(val)

df = pd.DataFrame({"EmployeeID": ["2016-CT-1028", "2016-CT-1028", 
                                  "2017-CT-1063", "2017-CT-1063", 
                                  "2015-CT-948", "2015-CT-948"]})
print(df["EmployeeID"].apply(custZfill))

Output:

0    2016-CT-1028
1    2016-CT-1028
2    2017-CT-1063
3    2017-CT-1063
4    2015-CT-0948
5    2015-CT-0948
Name: EmployeeID, dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Rakesh
  • 81,458
  • 17
  • 76
  • 113
  • What if 3 of all your answers is correct? Do I have to choose which one will I mark as the correct answer? – N.Omugs Oct 16 '18 at 07:12
2

With pandas it can be solved with split instead of regex:

df['EmployeeID'].apply(lambda x: '-'.join(x.split('-')[:-1] + [x.split('-')[-1].zfill(4)]))
Shaido
  • 27,497
  • 23
  • 70
  • 73
2

In pandas, you could use str.replace

df['EmployeeID'] = df.EmployeeID.str.replace(r'-(\d{3})$', r'-0\1', regex=True)


# Output:

0    2016-CT-1028
1    2016-CT-1028
2    2017-CT-1063
3    2017-CT-1063
4    2015-CT-0948
5    2015-CT-0948
Name: EmployeeID, dtype: object
Abhi
  • 4,068
  • 1
  • 16
  • 29
1

if the format of the id's is strictly defined, you can also use a simple list comprehension to do this job:

ids = [
'2017-CT-1063',
'2015-CT-948',
'2015-CT-948'
]

new_ids = [id if len(id) == 12 else id[0:8]+'0'+id[8:] for id in ids]
print(new_ids) 
# ['2017-CT-1063', '2015-CT-0948', '2015-CT-0948']
Simas Joneliunas
  • 2,890
  • 20
  • 28
  • 35
1

Here's a one liner:

df['EmployeeID'].apply(lambda x: '-'.join(xi if i != 2 else '%04d' % int(xi) for i, xi in enumerate(x.split('-'))))
Gerges
  • 6,269
  • 2
  • 22
  • 44