1

I have the following dataframe:

 A       B
645     72436
645     73311
543     432666
432     6443
432     765321

I need to create a column C, with a constant length of 10 digits. C should be created by concatenating B to A and padding with zeros between the two concatenated columns.
That is to say:

 A          B          C
645        72436      6450072436
645        73311      6450073311
543        432666     5430432666
432        6443       4320006443
432        765321     4320765321

The concatenation of A and B will always be 10 digits or less, so there are no cases in which C will have more than 10 digits.

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
CAPSLOCK
  • 6,243
  • 3
  • 33
  • 56

3 Answers3

4

First, try to extract number of digits in A using .str.len here, then use np.power with base as 10

num_of_digits = df['A'].astype(str).str.len()
df['C'] = df['A'].mul(np.power(10, 10-num_of_digits)) + df['B']

     A       B           C
0  645   72436  6450072436
1  645   73311  6450073311
2  543  432666  5430432666
3  432    6443  4320006443
4  432  765321  4320765321

Details

num_of_digits = df['A'].astype(str).str.len() # This gives number of digits
# 0    3
# 1    3
# 2    3
# 3    3
# 4    3
# Name: A, dtype: int64

np.power(10, 10-num_of_digits) # If number has 3 digits then multiply it by 10-3
                               # i.e. 10^7
# Small example of `np.power`
# np.power(10, [2, 3, 4])
#  array([  100,  1000, 10000], dtype=int32)
#          10^2   10^3   10^4

df['A'].mul(np.power(10, 10-num_of_digits))
# 0    6450000000
# 1    6450000000
# 2    5430000000
# 3    4320000000
# 4    4320000000
# Name: A, dtype: int64

Now add df['B'] to it to get desired results
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
2

Repeat 0 by length of joined rows:

df1 = df[['A','B']].astype(str)
s = df1.apply(lambda x: ''.join(['0'] * (10 - len(x.A + x.B))), axis=1)

df['new'] = df1.A + s + df1.B
print (df)
     A       B         new
0  645   72436  6450072436
1  645   73311  6450073311
2  543  432666  5430432666
3  432    6443  4320006443
4  432  765321  4320765321

EDIT: For improve performance working with integers use np.log10 with convert to integers and add 1 for number of digits and then multiple A by power 10:

a = np.log10(df['A']).astype(int).add(1)

df['new'] = df.A * 10 ** (10 - a) + df.B
print (df)
     A       B         new
0  645   72436  6450072436
1  645   73311  6450073311
2  543  432666  5430432666
3  432    6443  4320006443
4  432  765321  4320765321
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Hi, Thanks. This is very neat. However my dataframe is (potentially) very big and I was wondering if there is a way to count the number of digits of an integer (so to avoid the conversion to string and then conversion back to integer) – CAPSLOCK Jan 04 '21 at 11:17
1

You could do:

A = df['A'].astype('str')
B = df['B'].astype('str')

missing = 10 - (A.str.len() + B.str.len())

df['C'] = A + pd.Series(['0' * e for e in missing]) + B
print(df)

Output

     A       B           C
0  645   72436  6450072436
1  645   73311  6450073311
2  543  432666  5430432666
3  432    6443  4320006443
4  432  765321  4320765321

Using only math, you could do, (very similar to @jezrael):

import numpy as np
missing = 10 - (np.log10(df['A']).astype(int) + 1)
tens = 10 ** missing

df['C'] = (df['A'] * tens) + df['B']
print(df)

Output

     A       B           C
0  645   72436  6450072436
1  645   73311  6450073311
2  543  432666  5430432666
3  432    6443  4320006443
4  432  765321  4320765321

But beware of solutions involving log10, see this.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • Hi, Thanks. My dataframe is (potentially) very big and I was wondering if there is a way to count the number of digits of an integer (so to avoid the conversion to string and then conversion back to integer) – CAPSLOCK Jan 04 '21 at 11:17
  • second seems very similar like my second one. – jezrael Jan 04 '21 at 11:40