2

I have a task to take the first 6 digits of a column in pandas. However, if this number is less than 6 digits long it adds a decimal to the end of the number. Unfortunately, this is not acceptable for my needs later down the road.

I'm sure I can get rid of the decimal with various code, but It will probably be inefficient as DataFrames get larger.

Current code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN], 
                    'B' : [1,0,3,5,0,0,np.NaN,9,0,0], 
                    'C' : [10,0,30,50,0,0,4,10,1,0], 
                    'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
                    'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})

wow2 = df1
wow2['D'] = wow2['D'][:6]
print(wow2)

     A    B   C       D           E
0  NaN  1.0  10  123456      Assign
1  NaN  0.0   0  123456    Unassign
2  3.0  3.0  30  123456      Assign
3  4.0  5.0  50  123456        Ugly
4  5.0  0.0   0  12345.  Appreciate <--- Notice Decimal
5  5.0  0.0   0  12345.        Undo <--- Notice Decimal
6  3.0  NaN   4     NaN      Assign
7  1.0  9.0  10     NaN    Unicycle
8  5.0  0.0   1     NaN      Assign
9  NaN  0.0   0     NaN     Unicorn

Is there a way I can leave the digit if it's length is not over 6? I thought about converting the column to string and doing a loop... But I believe that would be wildly inefficient and create more problems than solutions

MattR
  • 4,887
  • 9
  • 40
  • 67
  • 2
    Do you need the first 6 values of a pandas Series or do you need the first 6 digits of all values in a pandas Series? At the moment, you select the first 6 values of your column D with `wow2['D'][:6]`. This does not change the actual values in your column but reduces your column D to only the first 6 values. – pansen Feb 24 '17 at 16:28
  • @pansen... you're right. Wow I completely missed that. Now my only thought is to change it to a string: `wow2['D'] = wow2['D'].apply(str).str[:6]` but then it still leaves the decimals and changes the NaNs to a string – MattR Feb 24 '17 at 16:35
  • Do you need to keep the np.NaN values or should they be replaced with some other value? – pansen Feb 24 '17 at 16:44
  • Ideally, I'd like to keep them as np.NaN. – MattR Feb 24 '17 at 16:45

1 Answers1

3

To get the first 6 digits of a number (without converting to string and back), you may use the modulo operator. In order to represent your numeric values as non floating point numbers you need to convert them into integers. However, mixing integers and np.NaN within the same column will result into float64 (see here for more). To get around this (which is kind of ugly) you need to convert the integers into strings which forces the dtype to be object because you mix strings and float values.

The solution looks like the following:

wow2['D'] = wow2['D'].mod(10**6)\
   .dropna()\
   .astype(int)\
   .astype(str)

print(wow['D'])

0    123456
1    123456
2    234567
3    345678
4     12345
5     12345
6    345678
7    456789
8    234567
9       NaN
Name: D, dtype: object
Community
  • 1
  • 1
pansen
  • 6,433
  • 4
  • 19
  • 32
  • Great answer! Thank you! If possible, could you explain why `dropna()` is in there? – MattR Feb 24 '17 at 17:45
  • I also notice that I cannot run your part of the code without a fresh copy of df1 being created. even adding `.copy()` to the end of all of the `astype()` does not work. Any idea why? – MattR Feb 24 '17 at 18:51
  • @MattR First, dropna() is used to remove np.NaN values before converting to integers and strings. Otherwise you are going to have "nan" strings instead of np.NaN values. Second, your *fresh copy* comment indicates that you are trying to apply the same operation multiple times on an already altered object. You can't execute the modulo operation on column D twice because the second time you have strings instead of numbers. – pansen Feb 24 '17 at 19:19
  • The OP asked the first six digits instead of the last six digits – SUNDONG Oct 29 '20 at 05:53