I think it is not necessary, if use read_csv
with sep=\s+
for whitespace separator and also parameter names
for specify new columns names:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/heart/heart.dat"
cols = ['age','sex','chestpain','restBP','chol','sugar','ecg',
'maxhr','angina','dep','exercise','fluor','thal','diagnosis']
df = pd.read_csv(url, sep='\s+', names=cols)
print (df)
age sex chestpain restBP chol sugar ecg maxhr angina dep \
0 70.0 1.0 4.0 130.0 322.0 0.0 2.0 109.0 0.0 2.4
1 67.0 0.0 3.0 115.0 564.0 0.0 2.0 160.0 0.0 1.6
2 57.0 1.0 2.0 124.0 261.0 0.0 0.0 141.0 0.0 0.3
3 64.0 1.0 4.0 128.0 263.0 0.0 0.0 105.0 1.0 0.2
4 74.0 0.0 2.0 120.0 269.0 0.0 2.0 121.0 1.0 0.2
.. ... ... ... ... ... ... ... ... ... ...
265 52.0 1.0 3.0 172.0 199.0 1.0 0.0 162.0 0.0 0.5
266 44.0 1.0 2.0 120.0 263.0 0.0 0.0 173.0 0.0 0.0
267 56.0 0.0 2.0 140.0 294.0 0.0 2.0 153.0 0.0 1.3
268 57.0 1.0 4.0 140.0 192.0 0.0 0.0 148.0 0.0 0.4
269 67.0 1.0 4.0 160.0 286.0 0.0 2.0 108.0 1.0 1.5
exercise fluor thal diagnosis
0 2.0 3.0 3.0 2
1 2.0 0.0 7.0 1
2 1.0 0.0 7.0 2
3 2.0 1.0 7.0 1
4 1.0 1.0 3.0 1
.. ... ... ... ...
265 1.0 0.0 7.0 1
266 1.0 0.0 7.0 1
267 2.0 0.0 3.0 1
268 2.0 0.0 6.0 1
269 2.0 3.0 3.0 2
[270 rows x 14 columns]
Then in data are not None
s and no missing values:
print (df.isna().any(1).any())
False
EDIT:
If need replace missing values or None
s to scalar use fillna
:
c = c.fillna(0)