I have the following code
X = df_X.as_matrix(header[1:col_num])
scaler = preprocessing.StandardScaler().fit(X)
X_nor = scaler.transform(X)
And got the following errors:
File "/Users/edamame/Library/python_virenv/lib/python2.7/site-packages/sklearn/utils/validation.py", line 54, in _assert_all_finite
" or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I used:
print(np.isinf(X))
print(np.isnan(X))
which gives me the output below. This couldn't really tell me which element has issue as I have millions of rows.
[[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]
...,
[False False False ..., False False False]
[False False False ..., False False False]
[False False False ..., False False False]]
Is there a way to identify which value in the matrix X actually cause the problem? How do people avoid it in general?