I have an array of strings with some elements such as 'na' that can't be converted to float by using x.astype(np.float)
as given here.
Please suggest any better way than the way I did it. Please find the procedure below (it is a snippet from my jupyter notebook, I have shown the intermediate steps just to demonstrate the changes):
In [4]: val_inc
Out [4]:
array(['na', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
'39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
'37.2844', '39.5835', 43.9194, '42.5485', '36.9052', 'na', 41.9264,
45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
'38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
'40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
40.3944, '40.2466', '32.2567', 'na', '38.8594', '43.947', 41.7973,
'41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
'38.9249', '33.2077', '42.4053', '42.559'], dtype=object)
In [5]: val_inc[val_inc == 'na']='0'
In [6]: val_inc
Out [6]:
array(['0', '38.012', '38.7816', '38.0736', '40.7118', '44.7382',
'39.6416', '38.9177', '36.9031', 43.2611, '38.2732', 40.7129,
'37.2844', '39.5835', 43.9194, '42.5485', '36.9052', '0', 41.9264,
45.3568, '44.6239', 38.1079, 45.2393, '32.785', '44.6239',
'38.0216', '38.4608', '42.5644', '35.3127', 33.2936, '33.0556',
'40.4476', 35.6581, '35.5574', '43.1096', '34.4751', 42.0554,
40.3944, '40.2466', '32.2567', '0', '38.8594', '43.947', 41.7973,
'41.8105', 40.3797, 31.2868, '45.3644', '40.7177', '41.8558',
'38.9249', '33.2077', '42.4053', '42.559'], dtype=object)
In [7]: val_inc = val_inc.astype(np.float)
In [8]: val_inc
Out [8]:
array([ 0. , 38.012 , 38.7816, 38.0736, 40.7118, 44.7382,
39.6416, 38.9177, 36.9031, 43.2611, 38.2732, 40.7129,
37.2844, 39.5835, 43.9194, 42.5485, 36.9052, 0. ,
41.9264, 45.3568, 44.6239, 38.1079, 45.2393, 32.785 ,
44.6239, 38.0216, 38.4608, 42.5644, 35.3127, 33.2936,
33.0556, 40.4476, 35.6581, 35.5574, 43.1096, 34.4751,
42.0554, 40.3944, 40.2466, 32.2567, 0. , 38.8594,
43.947 , 41.7973, 41.8105, 40.3797, 31.2868, 45.3644,
40.7177, 41.8558, 38.9249, 33.2077, 42.4053, 42.559 ])
In [9]: np.mean(val_inc[val_inc!=0.])
Out [9]: 39.587374509803915
In [10]: val_inc[val_inc==0.]=np.mean(val_inc[val_inc!=0.])
In [11]: val_inc
Out [11]:
array([ 39.58737451, 38.012 , 38.7816 , 38.0736 ,
40.7118 , 44.7382 , 39.6416 , 38.9177 ,
36.9031 , 43.2611 , 38.2732 , 40.7129 ,
37.2844 , 39.5835 , 43.9194 , 42.5485 ,
36.9052 , 39.58737451, 41.9264 , 45.3568 ,
44.6239 , 38.1079 , 45.2393 , 32.785 ,
44.6239 , 38.0216 , 38.4608 , 42.5644 ,
35.3127 , 33.2936 , 33.0556 , 40.4476 ,
35.6581 , 35.5574 , 43.1096 , 34.4751 ,
42.0554 , 40.3944 , 40.2466 , 32.2567 ,
39.58737451, 38.8594 , 43.947 , 41.7973 ,
41.8105 , 40.3797 , 31.2868 , 45.3644 ,
40.7177 , 41.8558 , 38.9249 , 33.2077 ,
42.4053 , 42.559 ])