124

I have a pandas data frame where the first 3 columns are strings:

         ID        text1    text 2
0       2345656     blah      blah
1          3456     blah      blah
2        541304     blah      blah        
3        201306       hi      blah        
4   12313201308    hello      blah         

I want to add leading zeros to the ID:

                ID    text1    text 2
0  000000002345656     blah      blah
1  000000000003456     blah      blah
2  000000000541304     blah      blah        
3  000000000201306       hi      blah        
4  000012313201308    hello      blah 

I have tried:

df['ID'] = df.ID.zfill(15)
df['ID'] = '{0:0>15}'.format(df['ID'])
jpp
  • 159,742
  • 34
  • 281
  • 339
jgaw
  • 1,624
  • 2
  • 14
  • 17

8 Answers8

148

str attribute contains most of the methods in string.

df['ID'] = df['ID'].str.zfill(15)

See more: http://pandas.pydata.org/pandas-docs/stable/text.html

Guangyang Li
  • 2,711
  • 1
  • 24
  • 27
  • 4
    added df['ID'] = df['ID'].astype(str) to handle when ID is numeric, the run df['ID'] = df['ID'].str.zfill(15) – Je Je Mar 14 '21 at 12:29
  • For a more general and customizable solution, one can use [`str.pad`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.pad.html); you can have a look at this [answer](https://stackoverflow.com/a/64806147/7465462) – Ric S Jul 15 '21 at 10:27
  • I get n error running this that int doesn't have attribute string. – max_settings Aug 30 '23 at 02:41
121

Try:

df['ID'] = df['ID'].apply(lambda x: '{0:0>15}'.format(x))

or even

df['ID'] = df['ID'].apply(lambda x: x.zfill(15))
Rohit
  • 3,087
  • 3
  • 20
  • 29
19

It can be achieved with a single line while initialization. Just use converters argument.

df = pd.read_excel('filename.xlsx', converters={'ID': '{:0>15}'.format})

so you'll reduce the code length by half :)

PS: read_csv have this argument as well.

Community
  • 1
  • 1
Danil
  • 4,781
  • 1
  • 35
  • 50
19

With Python 3.6+, you can also use f-strings:

df['ID'] = df['ID'].map(lambda x: f'{x:0>15}')

Performance is comparable or slightly worse versus df['ID'].map('{:0>15}'.format). On the other hand, f-strings permit more complex output, and you can use them more efficiently via a list comprehension.

Performance benchmarking

# Python 3.6.0, Pandas 0.19.2

df = pd.concat([df]*1000)

%timeit df['ID'].map('{:0>15}'.format)                  # 4.06 ms per loop
%timeit df['ID'].map(lambda x: f'{x:0>15}')             # 5.46 ms per loop
%timeit df['ID'].astype(str).str.zfill(15)              # 18.6 ms per loop

%timeit list(map('{:0>15}'.format, df['ID'].values))    # 7.91 ms per loop
%timeit ['{:0>15}'.format(x) for x in df['ID'].values]  # 7.63 ms per loop
%timeit [f'{x:0>15}' for x in df['ID'].values]          # 4.87 ms per loop
%timeit [str(x).zfill(15) for x in df['ID'].values]     # 21.2 ms per loop

# check results are the same
x = df['ID'].map('{:0>15}'.format)
y = df['ID'].map(lambda x: f'{x:0>15}')
z = df['ID'].astype(str).str.zfill(15)

assert (x == y).all() and (x == z).all()
jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    tried this with a 6gb file, works way faster than other methods, also more efficient. thanks @jpp – anky Oct 24 '18 at 19:22
  • You can even use `.map` instead of `.apply` – rpanai Nov 21 '18 at 13:04
  • @user32185, I *believe* they are interchangeable here and perform equally. Do you have a reason to think otherwise? – jpp Nov 21 '18 at 13:05
  • 1
    In some examples it's faster. Try `df['text1'].map('{:015}'.format)`. – rpanai Nov 21 '18 at 13:15
  • 1
    @user32185, Thank you, I see a very marginal improvement changing `apply` vs `map`, I'm not sure if that's setup dependent. I've updated timings and advice in my answer as `str.format` without `lambda` seems to win. – jpp Nov 21 '18 at 13:47
  • It seems to me that you replaced all `.apply`s with `.map`s. I think it will be better have all the options. – rpanai Nov 21 '18 at 13:50
  • @user32185, I'd rather focus on string manipulation functions. Your question is another question, but a good one *if you see a big difference* (which I don't). If you think it's important, you can [ask as a new question](https://stackoverflow.com/questions/ask) so we have a canonical to point to with a reasoning. – jpp Nov 21 '18 at 13:52
  • I don't have any question. I suggested you a faster method and you could just edit and add to your answer. You first claimed it was not faster, then said it's just marginally faster and finally replaced your original solution with the one I suggested. – rpanai Nov 21 '18 at 13:57
  • @user32185, I'm saying it's irrelevant (a few microseconds prove little). I just happened to use `map` last (by chance, not by intention). To me, I can use either without changing the purpose or impact. If you think it *is* relevant, then I suggest you point to some evidence or reasoning. – jpp Nov 21 '18 at 13:58
14

If you are encountering the error:

Pandas error: Can only use .str accessor with string values, which use np.object_ dtype in pandas

df['ID'] = df['ID'].astype(str).str.zfill(15)
Deskjokey
  • 568
  • 1
  • 7
  • 18
9

If you want a more customizable solution to this problem, you can try pandas.Series.str.pad

df['ID'] = df['ID'].astype(str).str.pad(15, side='left', fillchar='0')

str.zfill(n) is a special case equivalent to str.pad(n, side='left', fillchar='0')

Ric S
  • 9,073
  • 3
  • 25
  • 51
1

rjust worked for me:

df['ID']= df['ID'].str.rjust(15,'0')
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
mikecbos
  • 47
  • 6
1

ADD Leading Zeros to the Numeric Column in pandas:

df['ID']=df['ID'].apply(lambda x: '{0:0>15}'.format(x))

ADD Leading Zeros to the character column in pandas:

Method1:using Zfill

df['ID'] = df['ID'].str.zfill(15)

Method2:using rjust() function

df['ID']=df['ID'].str.rjust(15, "0")

Source : https://www.datasciencemadesimple.com/add-leading-preceding-zeros-python/

karaimadai
  • 101
  • 1
  • 6