concatenate two columns and pad with zeroes to have constant length

Question

I have the following dataframe:

 A       B
645     72436
645     73311
543     432666
432     6443
432     765321

I need to create a column C, with a constant length of 10 digits. C should be created by concatenating B to A and padding with zeros between the two concatenated columns.
That is to say:

 A          B          C
645        72436      6450072436
645        73311      6450073311
543        432666     5430432666
432        6443       4320006443
432        765321     4320765321

The concatenation of A and B will always be 10 digits or less, so there are no cases in which C will have more than 10 digits.

If `A` always 3 digits just pad required number of zeros i.e. `df['C'] = df['A'].astype(str) + df['B'].astype(str).str.zfill(7)` — Ch3steR, Jan 04 '21 at 11:12
Using little bit of math `df['A']*10**7 + df['B']` but works only if `A` always has 3 digits — Ch3steR, Jan 04 '21 at 11:19
@MichaelSzczesny No, unfortunately not. I believe it will only be either 3 or 4 digits — CAPSLOCK, Jan 04 '21 at 11:22

Ch3steR · Accepted Answer · 2021-01-04T11:33:57.607

First, try to extract number of digits in A using .str.len here, then use np.power with base as 10

num_of_digits = df['A'].astype(str).str.len()
df['C'] = df['A'].mul(np.power(10, 10-num_of_digits)) + df['B']

     A       B           C
0  645   72436  6450072436
1  645   73311  6450073311
2  543  432666  5430432666
3  432    6443  4320006443
4  432  765321  4320765321

Details

num_of_digits = df['A'].astype(str).str.len() # This gives number of digits
# 0    3
# 1    3
# 2    3
# 3    3
# 4    3
# Name: A, dtype: int64

np.power(10, 10-num_of_digits) # If number has 3 digits then multiply it by 10-3
                               # i.e. 10^7
# Small example of `np.power`
# np.power(10, [2, 3, 4])
#  array([  100,  1000, 10000], dtype=int32)
#          10^2   10^3   10^4

df['A'].mul(np.power(10, 10-num_of_digits))
# 0    6450000000
# 1    6450000000
# 2    5430000000
# 3    4320000000
# 4    4320000000
# Name: A, dtype: int64

Now add df['B'] to it to get desired results

jezrael · Answer 2 · 2021-01-04T11:27:51.357

Repeat 0 by length of joined rows:

df1 = df[['A','B']].astype(str)
s = df1.apply(lambda x: ''.join(['0'] * (10 - len(x.A + x.B))), axis=1)

df['new'] = df1.A + s + df1.B
print (df)
     A       B         new
0  645   72436  6450072436
1  645   73311  6450073311
2  543  432666  5430432666
3  432    6443  4320006443
4  432  765321  4320765321

EDIT: For improve performance working with integers use np.log10 with convert to integers and add 1 for number of digits and then multiple A by power 10:

a = np.log10(df['A']).astype(int).add(1)

df['new'] = df.A * 10 ** (10 - a) + df.B
print (df)
     A       B         new
0  645   72436  6450072436
1  645   73311  6450073311
2  543  432666  5430432666
3  432    6443  4320006443
4  432  765321  4320765321

Hi, Thanks. This is very neat. However my dataframe is (potentially) very big and I was wondering if there is a way to count the number of digits of an integer (so to avoid the conversion to string and then conversion back to integer) — CAPSLOCK, Jan 04 '21 at 11:17

Dani Mesejo · Answer 3 · 2021-01-04T11:53:37.453

1

You could do:

A = df['A'].astype('str')
B = df['B'].astype('str')

missing = 10 - (A.str.len() + B.str.len())

df['C'] = A + pd.Series(['0' * e for e in missing]) + B
print(df)

Output

     A       B           C
0  645   72436  6450072436
1  645   73311  6450073311
2  543  432666  5430432666
3  432    6443  4320006443
4  432  765321  4320765321

Using only math, you could do, (very similar to @jezrael):

import numpy as np
missing = 10 - (np.log10(df['A']).astype(int) + 1)
tens = 10 ** missing

df['C'] = (df['A'] * tens) + df['B']
print(df)

Output

     A       B           C
0  645   72436  6450072436
1  645   73311  6450073311
2  543  432666  5430432666
3  432    6443  4320006443
4  432  765321  4320765321

But beware of solutions involving log10, see this.

edited Jan 04 '21 at 11:53

answered Jan 04 '21 at 11:13

Dani Mesejo

61,499
6
49
76

Hi, Thanks. My dataframe is (potentially) very big and I was wondering if there is a way to count the number of digits of an integer (so to avoid the conversion to string and then conversion back to integer) – CAPSLOCK Jan 04 '21 at 11:17
second seems very similar like my second one. – jezrael Jan 04 '21 at 11:40

concatenate two columns and pad with zeroes to have constant length

3 Answers3

Details