Take average of each column in multiple csv files using Python

Question

I am a beginner in Python. I have searched about my problem but could not find the exact requirement.

I have a folder in which there are multiple files getting scored for each experimental measurement. Their names follow a trend, e.g. XY0001.csv, XY0002.csv ... XY0040.csv. I want to read all of these files and take the average of each column in all files, storing it in 'result.csv' in the same format.

Possible duplicate of [Import multiple csv files into pandas and concatenate into one DataFrame](http://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe) — Guido, Nov 22 '16 at 08:41

score 2 · Answer 1 · edited May 23 '17 at 12:24

I would suggest to use pandas (import pandas as pd). I suggest to start by reading the file using pd.read_csv(). How to read the files exactly depends on how your CSV files are formatted, I cannot tell that from here. If you want to read all files in a directory (which may be the easiest solution for this problem), try to use read all files.

Then, you could concatenate all files using pd.concat(). Lastly, you can calculate the metrics you want to generate (use the search functionality to find how to calculate each specific metric). A nice function that does a lot of stuff for you is the describe function.

score 1 · Answer 2 · answered Nov 25 '16 at 04:35

For access multiple files you can use glob module.

import glob

path =r'/home/root/csv_directory'
filenames = glob.glob(path + "/*.csv")

Python's pandas module have a method to parse csv file. It also some options to manage and process csv files.

import pandas as pd

dfs = []
for filename in filenames:
    dfs.append(pd.read_csv(filename))

.read_csv() method is used for parse csv files.

pd.concat(dfs, ignore_index=True)

.concat() used to concatenate all data into one dataframe and its easy for processing.

score 0 · Answer 3 · answered Nov 22 '16 at 09:23

The following makes use of the glob module to get a list of all files in the current folder of the form X*.csv, i.e. all CSV files starting with x. For each file it finds, it first skips a header row (optional) and it then loads all remaining rows using a zip() trick to transpose the list of rows into a list of columns.

For each column, it converts each cell into an integer and sums the values, dividing this total by the number of elements found, thus giving an average for each column. It then writes the values to your output result.csv in the format filename, av_col1, av_col2 etc:

import glob
import csv

with open('result.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)

    for filename in glob.glob('X*.csv'):
        print (filename)

        with open(filename, newline='') as f_input:
            csv_input = csv.reader(f_input)
            header = next(csv_input)
            averages = []

            for col in zip(*csv_input):
                averages.append(sum(int(x) for x in col) / len(col))

        csv_output.writerow([filename] + averages)

So if you had XY0001.csv containing:

Col1,Col2,Col3
6,1,10
2,1,20
5,2,30

result.csv would be written as follows:

XY0001.csv,4.333333333333333,1.3333333333333333,20.0

Tested using Python 3.5.2

Take average of each column in multiple csv files using Python

3 Answers3