I would like to replace missing data points with mean from each column in text with python.
So, my idea was:
- Read each column from text file
- Calculate a mean of each column
- Replace nan with calculated mean in each column
- Write them back to a new text file
I think that I am ok til step 2, but I have a trouble for step 3 and 4. My code is as follows;
for columns in ( raw.strip().split() for raw in f ):
a.append(columns[c])
x = np.array(a, float)
y = np.ma.masked_array(x,np.isnan(x))
y1 = np.mean(y)
a1 = ' '.join(a)
a1.replace("nan", "y1")
f1 = open("practice.txt", "w")
f1.write(a1)
As you can see, the problem here is related to replacing nan with mean with 'replace' command, because it is only dealing with string. I will really appreciate any help or suggestion. A part of my data looks like below
1.60566 nan 2.00755 2.32407
1.502 nan 1.36522 1.555
0.63333 nan 1.56102 2.08929
nan nan 0.87451 1.06667
2.5 nan 1.88889 1.0661
3.88197 nan 3.0875 2.75909
4.02692 nan 3.36154 3.92895
5.9907 nan 5.29535 5.82245
6.16111 2.67317 6.04074 6.25588
6.88269 2.62241 5.43958 6.07
5.92 2.48627 5.91818 6.75862
6.93429 6.17333 7.34 7.76538
8.25143 7.925 7.8087 8.725
8.1025 8.19429 8.11563 8.80937
8.12105 8.145 7.83889 8.37576
7.47292 8.65 8.35536 8.61081
8.10392 8.66032 8.74082 9.65484
10.03036 10.74727 10.634 10.50961
I want to replace those nans with mean value in each column.