I want to use the great expectations package to validate that a column in a .csv file only contains integers.
The file I am using has only integers in the age column except for one row which has a '`' character instead. This is what I want the expectation to catch. I have also checked the .csv file in a text editor and can confirm that the ages in the age column are not enclosed in quotes.
However, the expectation fails one 100% of the data. I think it is because pandas is reading the column in as the object type (so a string) because of the one incorrect row. I can preprocess this using something like .astype(int)
becauseit will fail on that row. And wrapping .astype(int)
in a try
block would completely defeat the purpose of using great expectations for this.
Here is a minimal working example:
good.csv:
age,name
34,Fred
22,Bob
54,Mary
bad.csv:
age,name
34,Fred
`,Bob
54,Mary
Code:
import great_expectations as ge
df = ge.read_csv("./good.csv");
my_df.expect_column_values_to_be_of_type('age','int')
df = ge.read_csv("./bad.csv");
my_df.expect_column_values_to_be_of_type('age','int')
The first case returns
{'success': True,
'result': {'element_count': 3,
'missing_count': 0,
'missing_percent': 0.0,
'unexpected_count': 0,
'unexpected_percent': 0.0,
'unexpected_percent_nonmissing': 0.0,
'partial_unexpected_list': []}}
So all the ages are ints and it succeeds on every row. I expect the second case to fail, but only on the second row. However it fails on all the rows:
{'success': False,
'result': {'element_count': 3,
'missing_count': 0,
'missing_percent': 0.0,
'unexpected_count': 3,
'unexpected_percent': 1.0,
'unexpected_percent_nonmissing': 1.0,
'partial_unexpected_list': ['34', '`', '54']}}
So I guess I expect something like this:
{'success': False,
'result': {'element_count': 3,
'missing_count': 0,
'missing_percent': 0.0,
'unexpected_count': 1,
'unexpected_percent': 0.33,
'unexpected_percent_nonmissing': 1.0,
'partial_unexpected_list': ['`']}}
Is there something I am doing wrong, or is the package just not capable of this?