Splitting a number and creating new individual columns for each number split using Pandas

Question

Good day,

I have a data frame with a column like this, let us assume with 1000rows but here is a sample:

I wish to split the number into two separate numbers. I want the output to look like this:

 A    B    C
12    1    2
24    2    4
36    3    6
48    4    8

How can I achieve this using Pandas and Numpy? Help would be truly appreciated. Thanks in advance!

jezrael · Accepted Answer · 2018-06-13T08:34:05.347

3

Use floor and mod:

df['B'] = df['A'] // 10
df['C'] = df['A'] % 10

print (df)
    A  B  C
0  12  1  2
1  24  2  4
2  36  3  6
3  48  4  8

If input data are strings is possible indexing by positions by []:

print (df['A'].apply(type))
0    <class 'str'>
1    <class 'str'>
2    <class 'str'>
3    <class 'str'>
Name: A, dtype: object

df['B'] = df['A'].str[0]
df['C'] = df['A'].str[1]
#if necessary convert all columns to integers
df = df.astype(int)
print (df)
    A  B  C
0  12  1  2
1  24  2  4
2  36  3  6
3  48  4  8

edited Jun 13 '18 at 08:34

answered Jun 13 '18 at 08:20

jezrael

822,522
95
1,334
1,252

what if the numbers are random and are not by divisible by 12 can this still be used? @jezrael – Deepak M Jun 13 '18 at 08:25
@DeepakM - Hmmm. solution working with numbers with length `2`, else get `487` splited to `48` and `7`. It depends what need. – jezrael Jun 13 '18 at 08:27
When I run it I get this error. `TypeError: unsupported operand type(s) for //: 'str' and 'int'` @jazrarel – Deepak M Jun 13 '18 at 08:30
@DeepakM - it means there are strings, use `df['A'].astype(int) // 10` and `df['A'].astype(int) % 10` first – jezrael Jun 13 '18 at 08:31
Ah ok, I realized there were `NaNs in there. It works but is there a way to make it work negating the NaNs. That would be nice! @jezrael – Deepak M Jun 13 '18 at 08:46
@DeepakM - So there are `NaN`s ? What is expected otuput if NaNs in new columns? – jezrael Jun 13 '18 at 08:48
Let us say it should stay NaN forms in the column... @jezrael – Deepak M Jun 13 '18 at 08:50
Ok, and `print (df['A'].apply(type))` return `string`s ? Or numbers? And for `NaN` it return `float` ? – jezrael Jun 13 '18 at 08:53
Yes returns the new columns in float and with the NaNs. @jezrael – Deepak M Jun 13 '18 at 08:59
Sorry, to many questions What return `print (df['A'].apply(type))` ? – jezrael Jun 13 '18 at 09:05
I just used dropna() first . But hey I had another question here https://stackoverflow.com/questions/50833671/creating-a-new-column-with-numbers-in-pandas-to-group-with-a-column-with-existin @jezrael – Deepak M Jun 13 '18 at 09:27

EdChum · Answer 2 · 2018-06-13T08:27:31.727

For a df that size use floordiv and mod:

In[141]:
df['B'] = df['A'].floordiv(10)
df['C'] = df['A'].mod(10)
df

Out[141]: 
    A  B  C
0  12  1  2
1  24  2  4
2  36  3  6
3  48  4  8

There are also the numpy equivalents, np.floor_divide and np.mod:

In[142]:
df['B'] = np.floor_divide(df['A'],10)
df['C'] = np.mod(df['A'],10)
df

Out[142]: 
    A  B  C
0  12  1  2
1  24  2  4
2  36  3  6
3  48  4  8

The numpy versions are faster:

%%timeit
df['B'] = df['A'].floordiv(10)
df['C']= df['A'].mod(10)
1000 loops, best of 3: 733 µs per loop

%%timeit
df['B'] = np.floor_divide(df['A'],10)
df['C'] = np.mod(df['A'],10)

1000 loops, best of 3: 491 µs per loop

score 2 · Answer 3 · answered Jun 13 '18 at 08:22

2

In [15]: df.A.astype(str).str.extractall(r'(.)')[0].unstack().astype(np.int8)
Out[15]:
match  0  1
0      1  2
1      2  4
2      3  6
3      4  8

answered Jun 13 '18 at 08:22

MaxU - stand with Ukraine

205,989
36
386
419

score 2 · Answer 4 · answered Jun 13 '18 at 08:48

Another approach based on splitting every character of the number treated as string:

df = pd.DataFrame([12, 24, 36, 48], columns=['A'])

values = df['A'].values
split = [list(str(el)) for el in values]

out = pd.DataFrame(split, columns=['B', 'C']).astype(int)

which gives:

Splitting a number and creating new individual columns for each number split using Pandas

4 Answers4