1

I have been using Python for a few days now, so, am really illiterate! I have text files with 5 columns and 30-40k rows which look like this:

2013-08-29T15:11:18.55912   0.019494552 0.110042184 0.164076427 0.587849877
2013-08-29T15:11:18.65912   0.036270974 0.097213155 0.122628797 0.556928624
2013-08-29T15:11:18.75912   0.055350041 0.104121094 0.121641949 0.593113069
2013-08-29T15:11:18.85912   0.057159263 0.107410588 0.198122695 0.591797271
2013-08-29T15:11:18.95912   0.05288292  0.102476346 0.172958062 0.591139372
2013-08-29T15:11:19.05912   0.043507861 0.104121094 0.162102731 0.598376261
2013-08-29T15:11:19.15912   0.068343545 0.102805296 0.168517245 0.587849877
2013-08-29T15:11:19.25912   0.054527668 0.105765841 0.184306818 0.587191978
2013-08-29T15:11:19.35912   0.055678991 0.107739538 0.169997517 0.539165352
2013-08-29T15:11:19.45912   0.05321187  0.102476346 0.167530397 0.645744989
2013-08-29T15:11:19.55912   0.055021092 0.103134245 0.158155337 0.604955251
2013-08-29T15:11:19.65912   0.054363193 0.103463195 0.154207944 0.587191978
...
...
...

Q: How can I take average of every 'n' rows of each column, write it into a new csv file in the same column, with the first column to be the "count" if the adjacent column has a number in it. e.g. for 12 rows of data and n=4 the result looks like this:

0   0.042068708 0.104252673 0.155885586 0.584165643
1   0.054815499 0.103792144 0.171971214 0.591139372
2   0.054568787 0.104203331 0.162472799 0.594264393
...

Progress!! this piece of code does the first column :) :

y = data [:,1]
xcount = range(len(y))
data[:,0] = xcount

  • I tried the method as explained here moving average. The plot looks good but the result values are not same as averages!
  • I managed to get average of each column using the following code, but this works only when the first column including strings are deleted from the input file. And also the result is in one column:

.

data = np.loadtxt('in.txt')
with open('out.csv', 'wb') as outfile:
    writer = csv.writer(outfile, delimiter='\t')
    for i in range(0,4):
        result = (data[:,i].mean())
        writer.writerow ([result])

The pandasrolling.mean also didn't work for me as it moves on each row

Any piece of advice, reference, hint... is really appreciated.

Community
  • 1
  • 1
PyLearner
  • 239
  • 2
  • 5
  • 11
  • You don't need complicated statistical functions. Simply break your data into samples of size n and apply a regular mean to each column of each sample independently. – Asad Saeeduddin Nov 21 '13 at 22:38
  • Thank for your cm Asad. But didn't get what you meant. It's about 30000 rows of data. Could you clarify how to break it into smaller samples please? – PyLearner Nov 24 '13 at 04:05
  • So supposing you've read your data into some sort of data structure, eg a list with each item representing a "record". Segment this list into a list of record groups, each group containing n records (more on how to do this [here](http://stackoverflow.com/questions/1218793/segment-a-list-in-python)). Obtain a list, tuple, whatever of the desired column values for the current group of records on each iteration. Apply a plain old mean to this. – Asad Saeeduddin Nov 24 '13 at 05:29
  • Thanks a lot Asad. It seems to be working :), just need to write them into columns rather that rows which I'll post as a new question. Would you post your comments as an answer so that I choose. – PyLearner Nov 25 '13 at 02:11

0 Answers0