2

After reading excel file using pandas, i am validating the "Numberss" column is having only the Integer values. If any other value is present i am displaying error message.

When i loop through the column items the values are converted to float automatically

for index, item in self.df["Numberss"].items():
     print(item)

If my input column is

5, 5.65, 5.55, 6

print(item) output is as:

5.0, 5.65, 5.55, 6.0

instead of

5, 5.65, 5.55, 6
useruser00
  • 157
  • 1
  • 2
  • 14
  • Can you print(df.dtypes) – Umar.H Apr 09 '19 at 05:12
  • @Datanovice its coming as Numberss - float64. If all the values in that cell are integer its coming print(df.dtypes) is coming as int64 – useruser00 Apr 09 '19 at 05:42
  • 1
    you can use `astype('int')` function call .. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html – hacker315 Apr 09 '19 at 06:04
  • @hacker315 I already tried astype(int,errors='raise'), but its converting the values to int not displaying error message. Is there any way to find not integer datatypes in a series ? – useruser00 Apr 09 '19 at 06:11
  • I don't know if there's function like this, but specifically for checking floats we can use: `series_name.astype('str').str.match(".").any()` to know if there's a float – hacker315 Apr 09 '19 at 06:18
  • two ways to deal with this if you have values in all your fields, (no strings NaNs or Datetimes) then you can use `.astype(int)` if you have multiple items, and you need to convert the column, try `pd.to_numeric(df[col],errors='coerce')` let me know if that works. finally, if you have only NaNs you want to fill, you can first use `.fillna(0)` on the entire df. – Umar.H Apr 09 '19 at 08:45
  • 1
    @Datanovice Finally i found the non integers in the cell using the Modulus operation and compared the arrays `intcheck = (df.Numberss% 1) == 0 print(df[intcheck]) #all rows containing only integers rows print(df[~intcheck][0]) #all rows containing non-integers data` – useruser00 Apr 09 '19 at 09:03
  • very clever solution, but I think you could use some of the pandas functionality to help you achieve it with less verbose code (but it depends on your use case) if my solution is not the correct one, make sure you post and green tick your own to help others. – Umar.H Apr 09 '19 at 09:07

4 Answers4

3

while reading, pandas automatically reads column as float if it has float values(whole column is read as float)

may this link help you

How to check if float pandas column contains only integer numbers?

Vinay G
  • 347
  • 2
  • 10
  • _thanks for the info_ i was expecting the suggestions on how to avoid the converstion of int to float. – useruser00 Apr 09 '19 at 09:06
  • 1
    some values of the column are float, pandas assigns the whole column with one datatype so you can't have float as well as int both at the same time – Vinay G Apr 10 '19 at 10:07
3

It is because your column contains integer and null values. Pandas reads these null values as NaN which is internally stored as a float.

You will have to replace these NaN values with a integer value, if you want to the column type as int

Ronak Mutha
  • 304
  • 1
  • 2
  • 13
2

As Already Discussed by many ,this problem may happened due to following reasons.

  1. Excel Column is having value which are consider as nan or nat by pd.read_excel.
  2. Excel Column is having float value.

To Avoid this one can use below option, already include in pd.read_excel method.

pd.read_excel(filepath,keep_default_na=False)--To handle 1st Reason
             OR
pd.read_excel(filepath,convert_float=True)--To handle 2nd Reason(by default it is True)
Techno_Eagle
  • 111
  • 4
0

Just some basic examples, to illustrate your issues, hope this helps.

the issues you may have could be the following :

you have multiple datatypes in your column

or

you have NaN's in your columns, which need to be filled first before converting it back to an int.

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.uniform(0,3.5,size=5),columns=['Nums'])
print(df)
    Nums
0   1.306457
1   1.921983
2   0.811730
3   0.208760
4   1.946754

it's quite obvious that these are floats, lets check the dtypes.

print(df.dtypes)
Nums    float64
dtype: object

so, lets cast this into an int,

df['Nums'].astype(int)
print(df)
Nums
0   1
1   1
2   0
3   0
4   1

but what if we had text and None types within our df?

types = ['String', np.nan, 'Other',np.nan,'More Text']
df1 = pd.DataFrame({'Nums' : types})
df = pd.concat([df,df1],ignore_index=True)
print(df)
    Nums
0   1
1   1
2   0
3   0
4   1
5   String
6   NaN
7   Other
8   NaN
9   More Text
# Lets cast this back to an int column 
df['Nums'] = pd.to_numeric(df['Nums'],errors='coerce')
# now lets fillna's with 0 and cast back to an int column
df['Nums'] = df['Nums'].fillna(0).astype(int)
print(df)
        Nums
0   1
1   1
2   0
3   0
4   1
5   0
6   0
7   0
8   0
9   0
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • This will work fine for the scenario where i need to fill 0 for non-integers. But i need to identify the non-integers and add some comments without converting it to int. – useruser00 Apr 09 '19 at 09:10
  • you could create a helper column and then keep the columns datatypes integrity intact, something like `df.loc[df.Nums == np.nan,'Comments'] = 'My Comments'` – Umar.H Apr 09 '19 at 09:19
  • 1
    ok, the problem i have is with Float values in the Integer column. So only i tried to avoid the Float conversion using the code i mentioned – useruser00 Apr 09 '19 at 09:25