I'm very new to Python and I have also searched a lot to find a question similar to mine. I would like to do something similar as explained in this question Computing averages of records from multiple files with python
However, instead of taking the mean of each value (as in this example all values are numeric) I would like to take the mean for a single column, but keep all the same values for the other columns"
For example:
fileA.txt:
0.003 0.0003 3 Active
0.003 0.0004 1 Active
fileB.txt:
0.003 0.0003 1 Active
0.003 0.0004 5 Active
and I would like to generate the following output file
output.txt
0.003 0.0003 2 Active
0.003 0.0004 3 Active
Although columns 1 and 2 are numeric too, they will be the same value for a same position across 100 files. So I'm only interested in the mean value for each element across 100 files for column 3.
Also, although the code in the question Computing averages of records from multiple files with python works for reading my files. It is not useful if you have lots of files. How can I optimize that?
I manage to read my files using the following code:
import numpy as np
result = []
for i in my_files:
a = np.array(np.loadtxt(i, dtype = str, delimiter = '\t', skiprows = 1))
result.append(a)
result = np.array(result)
I have used a similar code suggested in this question initialize a numpy array
Each of my files will the about 1500 rows per 4 columns. I tried to use np.mean but it does not work probably because some of my data are string type.
Thanks in advance for your help!