0

I'm trying to work with a dataset that has None values:

My uploading code is the following:

import pandas as pd
import io
import requests
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/heart/heart.dat"
s = requests.get(url).content
s = s.decode('utf-8')
s_rows = s.split('\n')
s_rows_cols = [each.split() for each in s_rows]
header_row = ['age','sex','chestpain','restBP','chol','sugar','ecg','maxhr','angina','dep','exercise','fluor','thal','diagnosis']
c = pd.DataFrame(s_rows_cols, columns = header_row)

and the output from c is : enter image description here

But it seems that there are some columns that has None values. How do I replace this None values by zeros?

Thanks

user2535338
  • 355
  • 4
  • 20

1 Answers1

0

I think it is not necessary, if use read_csv with sep=\s+ for whitespace separator and also parameter names for specify new columns names:

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/heart/heart.dat"

cols = ['age','sex','chestpain','restBP','chol','sugar','ecg',
        'maxhr','angina','dep','exercise','fluor','thal','diagnosis']
df = pd.read_csv(url, sep='\s+', names=cols)

print (df)
      age  sex  chestpain  restBP   chol  sugar  ecg  maxhr  angina  dep  \
0    70.0  1.0        4.0   130.0  322.0    0.0  2.0  109.0     0.0  2.4   
1    67.0  0.0        3.0   115.0  564.0    0.0  2.0  160.0     0.0  1.6   
2    57.0  1.0        2.0   124.0  261.0    0.0  0.0  141.0     0.0  0.3   
3    64.0  1.0        4.0   128.0  263.0    0.0  0.0  105.0     1.0  0.2   
4    74.0  0.0        2.0   120.0  269.0    0.0  2.0  121.0     1.0  0.2   
..    ...  ...        ...     ...    ...    ...  ...    ...     ...  ...   
265  52.0  1.0        3.0   172.0  199.0    1.0  0.0  162.0     0.0  0.5   
266  44.0  1.0        2.0   120.0  263.0    0.0  0.0  173.0     0.0  0.0   
267  56.0  0.0        2.0   140.0  294.0    0.0  2.0  153.0     0.0  1.3   
268  57.0  1.0        4.0   140.0  192.0    0.0  0.0  148.0     0.0  0.4   
269  67.0  1.0        4.0   160.0  286.0    0.0  2.0  108.0     1.0  1.5   

     exercise  fluor  thal  diagnosis  
0         2.0    3.0   3.0          2  
1         2.0    0.0   7.0          1  
2         1.0    0.0   7.0          2  
3         2.0    1.0   7.0          1  
4         1.0    1.0   3.0          1  
..        ...    ...   ...        ...  
265       1.0    0.0   7.0          1  
266       1.0    0.0   7.0          1  
267       2.0    0.0   3.0          1  
268       2.0    0.0   6.0          1  
269       2.0    3.0   3.0          2  

[270 rows x 14 columns]

Then in data are not Nones and no missing values:

print (df.isna().any(1).any())
False

EDIT:

If need replace missing values or Nones to scalar use fillna:

c = c.fillna(0)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252