0

i have 3 csv files named file1, file2, file3. Each CSV is filled with 3 Columns and 5653 rows:

1   0   -95
2   0   -94
3   0   -93
...
51  0   -93
0   1   -92
1   1   -91
2   1   -90
..

First column is a X variable 2nd is a y variable, 3rd is a measured value from which I want to have the mean.

What I want to do is:

  • read first row of file 1
  • read first row of file 2
  • read first row of file 3 and then count the mean of the measured value.

So for example:

file1 row1 -98 
file2 row1 -97
file3 row1 -95

mean 96,666666667

i want to write that mean into a new csv file with the following format

 1,0,mean_of_row1 (which would be 96,666666667)
 2,0,mean_of_row2
 3,0,mean_of_row3
 4,0,mean_of_row4

currently im able to calculate the mean of the measurement column of each file and store it as a row in a results file

import pandas as pd
import numpy as np

csv_file_list = ["file1.csv", "file2.csv", "file3.csv"]
result_csv = "result.csv"

with open(result_csv, 'wb') as rf:
    for idx, csv_file in enumerate(csv_file_list):
        csv_data = pd.read_csv(csv_file).values
        mean_measured = np.mean(csv_data[:, 2])
        rf.write(','.join([str(0), str(idx), str(mean_measured)+"\n"]))

But how can fulfill my intention? Thanks so far

Skat1337
  • 35
  • 1
  • 7

2 Answers2

2

In this situation, Pandas is really helpful. You can avoid all looping and neatly read csv into dataframe. Then join all three dataframes into one and calculate the pandas.DataFrame.mean of the required fields in row wise.

pandas.read_csv has the option to limit the number of rows using nrows parameter.

import pandas as pd

df1=pd.read_csv('file1.txt',names=['x1','Y1','Value1'],nrows=5356)
df2=pd.read_csv('file2.txt',names=['x2','Y2','Value2'],nrows=5356)
df3=pd.read_csv('text3.txt',names=['x3','Y3','Value3'],nrows=5356)

df_concat= pd.concat([df1,df2,df3], axis=1)
print df_concat


df_concat['meanvalue']=df_concat[['Value1','Value2','Value3']].mean(axis=1)
print(df_concat.to_csv(columns=['meanvalue'],index=False))

output

meanvalue
-96.5
-97.0
-86.0
-95.0
Shijo
  • 9,313
  • 3
  • 19
  • 31
1

You may just want to make one large pandas table in that case using join. The join value will need to be the index of the respective dataframe.

This way, you can join where the x and y value are the same. You will end up with 5 columns, x,y and the following 3 columns will be your values you want to calculate from. Now, you can simply create a new column that measures the mean across the last 3 values in a row for the dataframe. x or y, whichever is unique can be made as the index.

The pandas merge function should help you merge based on the rows themselves.

The SQL equivalent of what you are doing is an inner join on the y values, which I assume are unique per csv file.

  • But how am i able to read a row of a specific column? Like Row1 Column2 Row2 Column2 with `np.mean(csv_data[:, 2])` it only looks into the columns – Skat1337 Jan 25 '17 at 20:24
  • [This answer](http://stackoverflow.com/questions/34734940/in-pandas-how-to-get-average-of-several-column-values-for-each-row-in-data-fram) should help you take the average across columns. – Fruitspunchsamurai Jan 25 '17 at 20:28