Calculate Mean for each CSV row

Question

i have 3 csv files named file1, file2, file3. Each CSV is filled with 3 Columns and 5653 rows:

1   0   -95
2   0   -94
3   0   -93
...
51  0   -93
0   1   -92
1   1   -91
2   1   -90
..

First column is a X variable 2nd is a y variable, 3rd is a measured value from which I want to have the mean.

What I want to do is:

read first row of file 1
read first row of file 2
read first row of file 3 and then count the mean of the measured value.

So for example:

file1 row1 -98 
file2 row1 -97
file3 row1 -95

mean 96,666666667

i want to write that mean into a new csv file with the following format

 1,0,mean_of_row1 (which would be 96,666666667)
 2,0,mean_of_row2
 3,0,mean_of_row3
 4,0,mean_of_row4

currently im able to calculate the mean of the measurement column of each file and store it as a row in a results file

import pandas as pd
import numpy as np

csv_file_list = ["file1.csv", "file2.csv", "file3.csv"]
result_csv = "result.csv"

with open(result_csv, 'wb') as rf:
    for idx, csv_file in enumerate(csv_file_list):
        csv_data = pd.read_csv(csv_file).values
        mean_measured = np.mean(csv_data[:, 2])
        rf.write(','.join([str(0), str(idx), str(mean_measured)+"\n"]))

But how can fulfill my intention? Thanks so far

Shijo · Accepted Answer · 2017-01-26T14:05:20.160

2

In this situation, Pandas is really helpful. You can avoid all looping and neatly read csv into dataframe. Then join all three dataframes into one and calculate the pandas.DataFrame.mean of the required fields in row wise.

pandas.read_csv has the option to limit the number of rows using nrows parameter.

import pandas as pd

df1=pd.read_csv('file1.txt',names=['x1','Y1','Value1'],nrows=5356)
df2=pd.read_csv('file2.txt',names=['x2','Y2','Value2'],nrows=5356)
df3=pd.read_csv('text3.txt',names=['x3','Y3','Value3'],nrows=5356)

df_concat= pd.concat([df1,df2,df3], axis=1)
print df_concat


df_concat['meanvalue']=df_concat[['Value1','Value2','Value3']].mean(axis=1)
print(df_concat.to_csv(columns=['meanvalue'],index=False))

output

meanvalue
-96.5
-97.0
-86.0
-95.0

edited Jan 26 '17 at 14:05

answered Jan 25 '17 at 20:07

Shijo

9,313
3
19
31

that should be okay if you have enough memory – Shijo Jan 25 '17 at 21:19
is there a way to only print the meanvalue? like print df_concat['meanvalue'] without the index? – Skat1337 Jan 26 '17 at 11:19
`for a in range (0,5356,1): data=df_concat['meanvalue'][a]` did it for me because i need exactly 5356 rows – Skat1337 Jan 26 '17 at 12:16
updated the answer, now this code only reads first 5356 rows and prints meanvalue without index – Shijo Jan 26 '17 at 14:05

Fruitspunchsamurai · Answer 2 · 2017-01-25T20:26:10.343

1

You may just want to make one large pandas table in that case using join. The join value will need to be the index of the respective dataframe.

This way, you can join where the x and y value are the same. You will end up with 5 columns, x,y and the following 3 columns will be your values you want to calculate from. Now, you can simply create a new column that measures the mean across the last 3 values in a row for the dataframe. x or y, whichever is unique can be made as the index.

The pandas merge function should help you merge based on the rows themselves.

The SQL equivalent of what you are doing is an inner join on the y values, which I assume are unique per csv file.

edited Jan 25 '17 at 20:26

answered Jan 25 '17 at 20:06

Fruitspunchsamurai

408
4
13

But how am i able to read a row of a specific column? Like Row1 Column2 Row2 Column2 with `np.mean(csv_data[:, 2])` it only looks into the columns – Skat1337 Jan 25 '17 at 20:24
[This answer](http://stackoverflow.com/questions/34734940/in-pandas-how-to-get-average-of-several-column-values-for-each-row-in-data-fram) should help you take the average across columns. – Fruitspunchsamurai Jan 25 '17 at 20:28

Calculate Mean for each CSV row

2 Answers2

Linked