0

I have a CSV file I want to import and sum the second field, "Revenue". This is my current code that I pieced together from other posts, but can't get it to work. Any help would be greatly appreciated!

Error Types:

  1. ValueError: invalid literal for int() with base 10: ''
  2. The total is incorrect. For example, if the numbers were simplified and it read: 10, 20, 30, 40. Instead of sum = 100 it would say sum = 10

Code:

import csv

import pandas as pd

f = open('Sample Data.csv')
f.next() #Skip record 1 aka header row

total = 0
for row in f:
    total += int(row[[1]])

print total

Sample Data

Todor Minakov
  • 19,097
  • 3
  • 55
  • 60
CJC2104
  • 15
  • 1
  • 2
    It is bad to post code or data as pictures. There is no way to copy and paste a picture into a code editor. So it will be much harder, and thus less likely, for someone to help. To get the most out of the site it is important to [ask good questions](http://stackoverflow.com/help/how-to-ask), that includes creating a [Minimal, Complete, and Verifiable](http://stackoverflow.com/help/mcve) example. – Stephen Rauch Feb 11 '18 at 09:36
  • Please separate different questions into different posts. However, the first would be a duplciate: https://stackoverflow.com/q/1841565/1394393, and the second is not reproducible because your indentation causes an error. – jpmc26 Feb 11 '18 at 09:37

1 Answers1

0

You have to clean-up the data - your values are with $ and , chars in them, and thus cannot be converted to integers. Also, they have a decimal part, so they probably better be treated as floats.

Yet before that, you are opening the file as a plain text file - not through the csv, nor as panda dataframe, though you have imported those modules. Being a text file, your code doesn't have the notion for rows and columns - every row in the iterator is just a simple string for it.

Here's those problems fixed - reading it as csv, and the clean-up - replacing the 2 offending chars with empty string, before the conversion:

with open('Sample Data.csv', 'r') as f:
    reader = csv.reader(f)
    reader.next()  # skipping the header row

    total = 0
    for row in reader:
        value = row[1].replace('$', '').replace(',', '')  # and any other replacements your data may need
        value = float(value)
        # value = int(value)  # if you want to treat the data as integers anyway
        total += value
Todor Minakov
  • 19,097
  • 3
  • 55
  • 60
  • 1
    Thank you! Worked like a charm. Apologies on the source file confusion. It is actually a CSV although the screenshot was from Excel – CJC2104 Feb 11 '18 at 18:45