2

I'm trying to figure out how to check an entire column to verify all values are integers, except one, using python pandas. One row name will always have a float num. CSV example:

name, num
random1,2
random2,3
random3,2.89
random4,1
random5,3.45

In this example, let's say 'random3's num will always be a float. So that fact that random5 is also a float, means the program should print an error to the terminal telling the user this.

Jacques
  • 927
  • 9
  • 18
blindside044
  • 446
  • 1
  • 7
  • 20

2 Answers2

0

Try this:

if len(df.num.apply(type) == float) >= 2:
 print(f"Ups!. There are {len(df.num.apply(type) == float)} float numbers in the column") float numbers in the column")

Each component explanation:

df.num.apply(type) # Generates a series showing the amount of rows per class
(df.num.apply(type) == float) # Derived series sorting only the values with the defined class.
  • Your code doesn't do what the OP requested. `len(df.num.apply(type)==float)` returns `5`, which is the length of the series, not the number of floats in the series. – Craig Jul 22 '20 at 22:26
  • @Craig be sure your data as been set properly. Apply first df.num.apply(type) to check the all types of the dataframe and after that, apply the filter. – David Felipe Medina Mayorga Jul 22 '20 at 22:28
  • I cannot reproduce your results, please add your test data to the answer. – Craig Jul 22 '20 at 22:36
  • @Craig. It seems that your DataFrame converts all the integers in floats. That's why your results is 5. You're not setting properly the DataFrame as shown by the OP. – David Felipe Medina Mayorga Jul 22 '20 at 22:40
0

When the pandas read_csv() function loads the CSV file into a dataframe, it will assign the float dtype to any column that contains float and integer values. To test if the elements of the column can be expressed exactly as integers, you can use the .is_integer() method of floats as described in the answer to How to check if float pandas column contains only integer numbers?

In your case, you want to verify that you have only one float in the column, so do this:

import pandas as pd
df = pd.DataFrame({'name':[f"random{i}" for i in range(1,6)], 'num':[2, 3, 2.89, 1, 3.45]})

if sum(~df.num.apply(float.is_integer)) != 1:
    print("Error, the data column contains the wrong number of floats!")

If it is possible that the column only contains integers, then the column will have an integer dtype and the above code will cause an error. You could catch the error, or you could also test for this case:

from pandas.api.types import is_float_dtype

if not is_float_dtype(df.num) or sum(~df.num.apply(float.is_integer)) != 1:
    print("Error, the data column contains the wrong number of floats!")
Craig
  • 4,605
  • 1
  • 18
  • 28