-1

I have two csv files.

Csv 1
A 1
B 2
C 3

Csv 2
A 10
B 20
C 30

How do i find the sum difference of column 2

Like 10-1+20-2+30-3

So output csv would be

A 9
B 18
C 27

Total 54

I can achieve the difference but not total. Any help?

Rakesh
  • 81,458
  • 17
  • 76
  • 113
James
  • 1,124
  • 3
  • 17
  • 37

3 Answers3

1

Time to get familiar with pandas:

import pandas as pd
data = pd.read_csv(file_path, columns=['name', 'value'])
sum = data['value'].sum()

Pandas has heaps of well-documented functions that you can use to determine differences between columns/rows.

Or, for your example:

df = pd.DataFrame(columns=['name', 'value'])
for file in filepaths:
    new_df = pd.read_csv(file, columns=['name', 'value'])
    pd.append(df, new_df)
sum = data['value'].sum()

But this is quite basic and loses some information regarding where the original data came from, in which case you want to start thinking about how to properly construct your dataframe with indexes

PeptideWitch
  • 2,239
  • 14
  • 30
  • You are only achieving the second part of the question an not addressing adding to csv or the differences. also `data` in your second code is incorrect. and what is that link you are pointing to supposed to help with? – Akaisteph7 Jul 30 '19 at 12:28
1

Using the inbuild csv module.

Ex:

import csv

with open("Csv 1.csv") as csvfile_1, open("Csv 2.csv") as csvfile_2, open('outfile.csv', "w", newline="") as outfile_csv:
    reader = csv.reader(csvfile_1, delimiter=" ")     #read file 1
    reader2 = csv.reader(csvfile_2, delimiter=" ")    #read file 2
    writer = csv.writer(outfile_csv, delimiter=" ")   #Create writer object
    c = 0
    for m, n in zip(reader2, reader):
        val = int(m[1])-int(n[1])
        writer.writerow([m[0], int(m[1])-int(n[1])])  #Write result.
        c += val
    writer.writerow(["Total", c])  #Write Total
Rakesh
  • 81,458
  • 17
  • 76
  • 113
0

first read your csv using pandas(you have to use your fileName/ path instead of file1.csv, file2.csv):

import pandas as pd
df1 = pd.read_csv('file1.csv',header=None)
df2 = pd.read_csv('file2.csv',header=None)

then find difference into another dataframe:

df3 = df2
df3[1] = df2[1].sub(df1[1], axis = 0)

then write your final datafarame into csv:

df3.to_csv('file3.csv')

you can find sum:

print(df3[1].sum())

output: 54

print(df3)

output:

    0   1
0   A   9
1   B   18
2   C   27
SM Abu Taher Asif
  • 2,221
  • 1
  • 12
  • 14