I have a large dataset which I have imported using the read_csv
as described below which should be float measurement and NaN.
df = pd.read_csv(file_,parse_dates=[['Date','Time']],na_values = ['No Data','Bad Data','','No Sample'],low_memory=False)
When I apply df.dtypes
, most of the columns return as object type which indicate that there are other objects in the dataframe that I am not aware of.I am looking for a way of identifying those string and replace then by na values
.
First thing that I wanted to do was to convert everything to dtype = np.float
but I couldn't. Then, I tried to read in each (columns,index) and return the identified string.
I have tried something very inefficient (I am a beginner) and time consuming, it has worked for other dataframe but here it returns a errors:
TypeError: argument of type 'float' is not iterable
from isstring import *
list_string = []
for i in range(0,len(df)):
for j in range(0,len(df.columns)):
x = test.ix[i,j]
if isstring(x) and '.'not in x:
list_string.append(x)
list_string = pd.DataFrame(list_string, columns=["list_string"])
g = list_string.groupby('list_string').size()
Is there a simple way of detecting unknown string in large dataset. Thanks