1

I have csv files (let's say = 30) and I want to calculate the average of all 30 csv using corresponding values and make a new output.csv file.

Example csv file: (I have 13 colums and 16 rows)

| Dataset | VALUE1 | VALUE2 |
|:---- |:------:| -----:|
| Name1  | 2.4    | 4.2 |
| Name2  | 3.5    | 9.3 |
| Name3  | 4.6    | 11.5 |

Now I have 30 csv files like this where first row is header and 1st colum also contains string names.

What I want to do is to take average of all the 30 csv files (e.g., add value1, name1) of 30 csv files and in output file having average of these 30 values and this should be done for each and every position (except for sure the first row and first colum) as they are containing string values.

I tried with pandas and numpy both but till now no luck.

My code:

import pandas as pd
from pathlib2 import Path
import numpy as np

root = '../Dataset'
#print(tool_files_path)

file_names_list = []
ls=[]
entries = Path(root)
for entry in entries.iterdir():
    if entry.is_dir():
        for file in entry.iterdir():
            if file.is_file():
                if  file.name == 'summary_x.csv':
                    file_names_list.append(file)
                    #print(file)
                    #file = pd.read_csv(file)
                    #print file
                    #all_files_default = pd.concat(file))

print file_names_list
df_final = pd.DataFrame()
range = [i for i in range(1,13)]
for file_name in file_names_list:
    df = pd.read_csv(file_name, skiprows=0, usecols=range)
    print df
    df_final = df_final.add(df.reset_index(), fill_value=0)

#print df_final
#print os.getcwd()
df_final.to_csv('output.csv')

Edit: With updated code, there is dataframes addition but the index of colums is not as it is in original file and there are empty cells, I suppose because there were 0.0 added many times

Zanis Ali
  • 74
  • 1
  • 9
  • Your code is incomplete? Why not import the file into a dataframe and perform your calculations when you are getting the file name listing? – C. Cooney Dec 29 '20 at 12:48
  • @C.Cooney. I tried many codes using pd.concat() but it shows `NaN` in the output although all the indexes in csv files contains values – Zanis Ali Dec 29 '20 at 12:50
  • 2
    You likely need to research pandas "reset_index()" which is a common cause of new NaN columns. – C. Cooney Dec 29 '20 at 12:53

1 Answers1

0

You can use DataFrame.add to add values of DataFrames as shown here and then devied each value by the number of DataFrames you added in the result DataFrame using DataFrame.applymap as shown here.

When you break down your problem it's not that hard to find an answer :)

yakir0
  • 184
  • 6