31

I'm trying to see if I can remove the trailing zeros from this phone number column.

Example:

0
1      8.00735e+09
2      4.35789e+09
3      6.10644e+09

The type in this column is an object, and I tried to round it but I am getting an error. I checked a couple of them I know they are in this format "8007354384.0", and want to get rid of the trailing zeros with the decimal point.

Sometimes I received in this format and sometimes I don't, they will be integer numbers. I would like to check if the phone column has a trailing zero, then remove it.

I have this code but I'm stuck on how to check for trailing zeros for each row.

data.ix[data.phone.str.contains('.0'), 'phone']

I get an error => *** ValueError: cannot index with vector containing NA / NaN values. I believe the issue is because some rows have empty data, which sometime I do receive. The code above should be able to skip an empty row.

Does anybody have any suggestions? I'm new to pandas but so far it's an useful library. Your help will be appreciated.

Note The provided example above, the first row has an empty data, which I do sometimes I get. Just to make sure this is not represented as 0 for phone number.

Also empty data is considered a string, so it's a mix of floats and string, if rows are empty.

Mr. T
  • 11,960
  • 10
  • 32
  • 54
medev21
  • 2,469
  • 8
  • 31
  • 43

12 Answers12

15

use astype(np.int64)

s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
mask = pd.to_numeric(s).notnull()
s.loc[mask] = s.loc[mask].astype(np.int64)
s

0              
1    8007350000
2    4357890000
3    6106440000
dtype: object
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 1
    this is not working because of that empty data in the first row, `*** ValueError: invalid literal for long() with base 10: ''`. I am not familiar with dtype object, but it looks this is being a treated as string. Correct me if I'm wrong. – medev21 Feb 22 '17 at 23:19
  • I noticed, since the first row is an empty data, it's considered as a string, which throws that error. Is there a way to bypass it? – medev21 Feb 22 '17 at 23:33
  • What do you want the result to be for that row. – piRSquared Feb 22 '17 at 23:36
  • for that row it should be empty, so I'm ok with the empty string ( '' ), the rest I want the trailing zeroes removed. – medev21 Feb 22 '17 at 23:46
  • it seems the wright way...but still, I have no idea why this is not working in my case :( -- I have `df.loc[df.my_var.notnull(),'my_var'].astype(np.int64)` – jjrr Jun 30 '18 at 17:39
  • Was there any definitive answer for this? I have the same kind of question, cant get it to work. i've tried this approach and the below. – excelguy Dec 13 '19 at 18:34
  • There are many nuances specific to the context of the actual problem. OP never got back with answers that would have clarified the context. Feel free to ask a new question. – piRSquared Dec 13 '19 at 18:50
13

In Pandas/NumPy, integers are not allowed to take NaN values, and arrays/series (including dataframe columns) are homogeneous in their datatype --- so having a column of integers where some entries are None/np.nan is downright impossible.

EDIT:data.phone.astype('object') should do the trick; in this case, Pandas treats your column as a series of generic Python objects, rather than a specific datatype (e.g. str/float/int), at the cost of performance if you intend to run any heavy computations with this data (probably not in your case).

Assuming you want to keep those NaN entries, your approach of converting to strings is a valid possibility:

data.phone.astype(str).str.split('.', expand = True)[0]

should give you what you're looking for (there are alternative string methods you can use, such as .replace or .extract, but .split seems the most straightforward in this case).

Alternatively, if you are only interested in the display of floats (unlikely I'd suppose), you can do pd.set_option('display.float_format','{:.0f}'.format), which doesn't actually affect your data.

Ken Wei
  • 3,020
  • 1
  • 10
  • 30
11

This answer by cs95 removes trailing “.0” in one row.

df = df.round(decimals=0).astype(object)
cs95
  • 379,657
  • 97
  • 704
  • 746
Brohm
  • 143
  • 1
  • 5
  • Pretty sure [I offered OP this solution](https://stackoverflow.com/questions/47542657/round-pandas-column-with-precision-but-no-trailing-0/47542713?noredirect=1#comment104856358_47542713) (happened under the answer you linked, too), apparently no bueno. No idea what OP wants ;-) – cs95 Dec 14 '19 at 22:43
8
import numpy as np
import pandas as pd

s = pd.Series([ None, np.nan, '',8.00735e+09,  4.35789e+09, 6.10644e+09])

s_new = s.fillna('').astype(str).str.replace(".0","",regex=False)
s_new

Here I filled null values with empty string, converted series to string type, replaced .0 with empty string.
This outputs:

0              
1              
2              
3    8007350000
4    4357890000
5    6106440000
dtype: object
erncyp
  • 1,649
  • 21
  • 23
7

Just do

data['phone'] = data['phone'].astype(str)          
data['phone'] = data['phone'].str.replace('.0', ' ')

which uses a regex style lookup on all entries in the column and replaces any '.0' matches with blank space. For example

data = pd.DataFrame(
    data = [['bob','39384954.0'],['Lina','23827484.0']], 
    columns = ['user','phone'], index = [1,2]
)

data['phone'] = data['phone'].astype(str)
data['phone'] = data['phone'].str.replace('.0', ' ')
print data

   user     phone
1   bob  39384954
2  Lina  23827484
Tony
  • 1,318
  • 1
  • 14
  • 36
Некто
  • 1,730
  • 10
  • 17
  • 1
    I tried to do that but I get an error => `*** TypeError: to_numeric() got an unexpected keyword argument 'downcast'`. Here is my code `pd.to_numeric(data.phone, errors='ignore', downcast='integer')` – medev21 Feb 23 '17 at 00:09
  • 1
    What version of pandas do you have? – Некто Feb 23 '17 at 00:19
  • 1
    pandas version is "0.18.1" – medev21 Feb 23 '17 at 00:20
  • 1
    What if I have several columns, how do I do that in one command? Besides the fact that it did not work. (Don't know why) – pashute Nov 28 '17 at 23:59
  • 4
    I had to specify the optional parameter regex=False in the str.replace function. the default true value was replacing all 1s and 0s. – BossRoyce Jun 25 '18 at 17:55
4

So Pandas automatically assign data type by looking at type of data in the event when you have mix type of data like some rows are NaN and some has int value there is huge possibilities it would assign dtype: object or float64

EX 1:

import pandas as pd

data = [['tom', 10934000000], ['nick', 1534000000], ['juli', 1412000000]]
df = pd.DataFrame(data, columns = ['Name', 'Phone'])

>>> df
   Name        Phone
0   tom  10934000000
1  nick   1534000000
2  juli   1412000000

>>> df.dtypes
Name     object
Phone     int64
dtype: object

In above example pandas assume data type int64 reason being neither of row has NaN and all the rows in Phone column has integer value.

EX 2:

 >>> data = [['tom'], ['nick', 1534000000], ['juli', 1412000000]]
 >>> df = pd.DataFrame(data, columns = ['Name', 'Phone'])
 >>> df

 Name         Phone
0   tom           NaN
1  nick  1.534000e+09
2  juli  1.412000e+09

>>> df.dtypes
Name      object
Phone    float64
dtype: object

To answer to your actual question, to get rid of .0 at the end you can do something like this

Solution 1:

>>> data = [['tom', 9785000000.0], ['nick', 1534000000.0], ['juli', 1412000000]]
>>> df = pd.DataFrame(data, columns = ['Name', 'Phone'])
>>> df
   Name         Phone
0   tom  9.785000e+09
1  nick  1.534000e+09
2  juli  1.412000e+09

>>> df['Phone'] = df['Phone'].astype(int).astype(str)
>>> df
   Name       Phone
0   tom  9785000000
1  nick  1534000000
2  juli  1412000000

Solution 2:

>>> df['Phone'] = df['Phone'].astype(str).str.replace('.0', '', regex=False)
>>> df
   Name       Phone
0   tom  9785000000
1  nick  1534000000
2  juli  1412000000
Shyam Bhimani
  • 1,310
  • 1
  • 22
  • 37
3

Try str.isnumeric with astype and loc:

s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
c = s.str.isnumeric().astype(bool)
s.loc[c] = s.loc[c].astype(np.int64)
print(s)

And now:

print(s)

Outputs:

0              
1    8007350000
2    4357890000
3    6106440000
dtype: object
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
  • 1
    Seems like I cannot get it to work. I am making c my dataframe column like so , `c=df['id'].str.isnumeric().astype(bool)` . then doing `df['id'].loc[c] = df['id'].loc(c).astype(np.int64)` . Any ideas on this? – excelguy Dec 15 '19 at 17:33
  • 1
    any ideas Monica? – excelguy Dec 16 '19 at 17:39
  • 1
    @excelguy Change: `df['id'].loc[c] = df['id'].loc(c).astype(np.int64)` to `df['id'].loc[c] = df['id'].loc[c].astype(np.int64)` – U13-Forward Dec 17 '19 at 03:19
3

Here is a solution using pandas nullable integers (the solution assumes that input Series values are either empty strings or floating point numbers):

import pandas as pd, numpy as np
s = pd.Series(['', 8.00735e+09, 4.35789e+09, 6.10644e+09])
s.replace('', np.nan).astype('Int64')

Output (pandas-0.25.1):

0           NaN
1    8007350000
2    4357890000
3    6106440000
dtype: Int64

Advantages of the solution:

  • The output values are either integers or missing values (not 'object' data type)
  • Efficient
S.V
  • 2,149
  • 2
  • 18
  • 41
2

It depends on the data format the telephone number is stored.

If it is in an numeric format changing to an integer might solve the problem

df = pd.DataFrame({'TelephoneNumber': [123.0, 234]})
df['TelephoneNumber'] =  df['TelephoneNumber'].astype('int32')

If it is really a string you can replace and re-assign the column.

df2 = pd.DataFrame({'TelephoneNumber': ['123.0', '234']})
df2['TelephoneNumber'] = df2['TelephoneNumber'].str.replace('.0', '')
Marcel Flygare
  • 837
  • 10
  • 19
  • and also need to add regex=False as df2['TelephoneNumber'].str.replace('.0', '',regex=False) as in Shyam Bhimani's answer. – DOT Dec 30 '22 at 17:01
1
import numpy as np
tt = 8.00735e+09
time = int(np.format_float_positional(tt)[:-1])
chrisckwong821
  • 1,133
  • 12
  • 24
0

If somebody is still interesting: I had the problem that I round the df and get the trailing zeros. Here is what I did.

new_df = np.round(old_df,3).astype(str)

Then all trailing zeros were gone in the new_df.

Tiago Martins Peres
  • 14,289
  • 18
  • 86
  • 145
ahmadi
  • 17
  • 7
0

I was also facing the same problem with empty rings in some rows.

The most helpful answer on this Python - Remove decimal and zero from string link helped me.

New_Noob
  • 21
  • 2
  • 1
    This should be a comment, no need to add it as separate answer. you will be able to add comments after gaining some reputations on platform. – Sagar Darekar May 19 '22 at 04:37