Add Leading Zeros to Strings in Pandas Dataframe

Question

I have a pandas data frame where the first 3 columns are strings:

         ID        text1    text 2
0       2345656     blah      blah
1          3456     blah      blah
2        541304     blah      blah        
3        201306       hi      blah        
4   12313201308    hello      blah

I want to add leading zeros to the ID:

                ID    text1    text 2
0  000000002345656     blah      blah
1  000000000003456     blah      blah
2  000000000541304     blah      blah        
3  000000000201306       hi      blah        
4  000012313201308    hello      blah

I have tried:

df['ID'] = df.ID.zfill(15)
df['ID'] = '{0:0>15}'.format(df['ID'])

can you please explain how ```'{0:0>15}'.format(df['ID'])``` works. — spd, Mar 25 '22 at 04:04

score 148 · Answer 1 · answered Jul 22 '15 at 04:18

148

str attribute contains most of the methods in string.

df['ID'] = df['ID'].str.zfill(15)

See more: http://pandas.pydata.org/pandas-docs/stable/text.html

answered Jul 22 '15 at 04:18

Guangyang Li

2,711
1
24
27

4

added df['ID'] = df['ID'].astype(str) to handle when ID is numeric, the run df['ID'] = df['ID'].str.zfill(15) – Je Je Mar 14 '21 at 12:29
For a more general and customizable solution, one can use [`str.pad`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.pad.html); you can have a look at this [answer](https://stackoverflow.com/a/64806147/7465462) – Ric S Jul 15 '21 at 10:27
I get n error running this that int doesn't have attribute string. – max_settings Aug 30 '23 at 02:41

score 121 · Accepted Answer · answered May 23 '14 at 18:42

121

Try:

df['ID'] = df['ID'].apply(lambda x: '{0:0>15}'.format(x))

or even

df['ID'] = df['ID'].apply(lambda x: x.zfill(15))

answered May 23 '14 at 18:42

Rohit

3,087
3
20
29

32

No need for the `lambda` on the first one: `apply('{:0>15}'.format)` should work too. – DSM May 23 '14 at 18:58
1

@DSM Nice. Did not know that. – Rohit May 23 '14 at 19:41
It can be achieved with single line while initialization, see my answer below. – Danil May 02 '18 at 15:27
1

@Rohit how would the code change if the string had decimals or letters in it? i.e. what is required for converting `2.0a` to `02.0a` using a `lambda x: x.zfill(2)`? – Andreuccio Apr 10 '19 at 09:31
Can you explain why you use 15 , what is the significance of the number, what if I only want to add one `0`? – Murtaza Haji Jun 04 '20 at 02:30
Hi @MurtazaHaji 15 is the desired total length of the string – Robert Alexander Dec 06 '21 at 16:01

score 19 · Answer 3 · edited Jun 20 '20 at 09:12

19

It can be achieved with a single line while initialization. Just use converters argument.

df = pd.read_excel('filename.xlsx', converters={'ID': '{:0>15}'.format})

so you'll reduce the code length by half :)

PS: read_csv have this argument as well.

edited Jun 20 '20 at 09:12

Community

1
1

answered May 02 '18 at 15:26

Danil

4,781
1
35
50

jpp · Answer 4 · 2018-11-21T13:47:19.767

19

With Python 3.6+, you can also use f-strings:

df['ID'] = df['ID'].map(lambda x: f'{x:0>15}')

Performance is comparable or slightly worse versus df['ID'].map('{:0>15}'.format). On the other hand, f-strings permit more complex output, and you can use them more efficiently via a list comprehension.

Performance benchmarking

# Python 3.6.0, Pandas 0.19.2

df = pd.concat([df]*1000)

%timeit df['ID'].map('{:0>15}'.format)                  # 4.06 ms per loop
%timeit df['ID'].map(lambda x: f'{x:0>15}')             # 5.46 ms per loop
%timeit df['ID'].astype(str).str.zfill(15)              # 18.6 ms per loop

%timeit list(map('{:0>15}'.format, df['ID'].values))    # 7.91 ms per loop
%timeit ['{:0>15}'.format(x) for x in df['ID'].values]  # 7.63 ms per loop
%timeit [f'{x:0>15}' for x in df['ID'].values]          # 4.87 ms per loop
%timeit [str(x).zfill(15) for x in df['ID'].values]     # 21.2 ms per loop

# check results are the same
x = df['ID'].map('{:0>15}'.format)
y = df['ID'].map(lambda x: f'{x:0>15}')
z = df['ID'].astype(str).str.zfill(15)

assert (x == y).all() and (x == z).all()

edited Nov 21 '18 at 13:47

answered Jul 13 '18 at 15:12

jpp

159,742
34
281
339

1

tried this with a 6gb file, works way faster than other methods, also more efficient. thanks @jpp – anky Oct 24 '18 at 19:22
You can even use `.map` instead of `.apply` – rpanai Nov 21 '18 at 13:04
@user32185, I *believe* they are interchangeable here and perform equally. Do you have a reason to think otherwise? – jpp Nov 21 '18 at 13:05
1

In some examples it's faster. Try `df['text1'].map('{:015}'.format)`. – rpanai Nov 21 '18 at 13:15
1

@user32185, Thank you, I see a very marginal improvement changing `apply` vs `map`, I'm not sure if that's setup dependent. I've updated timings and advice in my answer as `str.format` without `lambda` seems to win. – jpp Nov 21 '18 at 13:47
It seems to me that you replaced all `.apply`s with `.map`s. I think it will be better have all the options. – rpanai Nov 21 '18 at 13:50
@user32185, I'd rather focus on string manipulation functions. Your question is another question, but a good one *if you see a big difference* (which I don't). If you think it's important, you can [ask as a new question](https://stackoverflow.com/questions/ask) so we have a canonical to point to with a reasoning. – jpp Nov 21 '18 at 13:52
I don't have any question. I suggested you a faster method and you could just edit and add to your answer. You first claimed it was not faster, then said it's just marginally faster and finally replaced your original solution with the one I suggested. – rpanai Nov 21 '18 at 13:57
@user32185, I'm saying it's irrelevant (a few microseconds prove little). I just happened to use `map` last (by chance, not by intention). To me, I can use either without changing the purpose or impact. If you think it *is* relevant, then I suggest you point to some evidence or reasoning. – jpp Nov 21 '18 at 13:58

score 14 · Answer 5 · answered Sep 16 '18 at 04:52

14

If you are encountering the error:

Pandas error: Can only use .str accessor with string values, which use np.object_ dtype in pandas

df['ID'] = df['ID'].astype(str).str.zfill(15)

answered Sep 16 '18 at 04:52

Deskjokey

568
1
7
18

Ric S · Answer 6 · 2020-11-17T13:22:01.350

9

If you want a more customizable solution to this problem, you can try pandas.Series.str.pad

df['ID'] = df['ID'].astype(str).str.pad(15, side='left', fillchar='0')

str.zfill(n) is a special case equivalent to str.pad(n, side='left', fillchar='0')

edited Nov 17 '20 at 13:22

answered Nov 12 '20 at 14:48

Ric S

9,073
3
25
51

score 1 · Answer 7 · edited Mar 03 '22 at 21:30

1

rjust worked for me:

df['ID']= df['ID'].str.rjust(15,'0')

edited Mar 03 '22 at 21:30

Henry Ecker

34,399
18
41
57

answered Mar 03 '22 at 00:29

mikecbos

47
6

karaimadai · Answer 8 · 2023-06-17T08:42:43.337

1

ADD Leading Zeros to the Numeric Column in pandas:

df['ID']=df['ID'].apply(lambda x: '{0:0>15}'.format(x))

ADD Leading Zeros to the character column in pandas:

Method1:using Zfill

df['ID'] = df['ID'].str.zfill(15)

Method2:using rjust() function

df['ID']=df['ID'].str.rjust(15, "0")

Source : https://www.datasciencemadesimple.com/add-leading-preceding-zeros-python/

edited Jun 17 '23 at 08:42

answered Jun 17 '23 at 08:34

karaimadai

101
1
6

Add Leading Zeros to Strings in Pandas Dataframe

8 Answers8

Performance benchmarking

Linked

Related