-1

From a CSV of various years, I am trying to total up values in another column from one particular year. For example, take arbitrary values

years value
2000 5
2001 4
2000 3
2002 6

and so on

with a defined value total as total_value = 0.

I have:

for year in years:
   if year == 2000:
       total_value += ???

where the returned value should be 8. I've tried using the .loc modifier to no avail. It's a simple question but I just can't get it right now!

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
  • _"I've tried using the .loc modifier"_: show us what you tried! Filtering a dataframe by a column is a pretty common questio. – Pranav Hosangadi Mar 15 '23 at 21:28
  • Does this answer your question? [How do I select rows from a DataFrame based on column values?](https://stackoverflow.com/questions/17071871/how-do-i-select-rows-from-a-dataframe-based-on-column-values) – Pranav Hosangadi Mar 15 '23 at 21:29

3 Answers3

0
import pandas as pd
df = pd.read_csv('path_here.csv')
df.groupby('years').value.sum()[2000]
Michael Cao
  • 2,278
  • 1
  • 1
  • 13
0

Here are some alternative non-pandas answers, using the built in csv module. Note that as I'm not converting the year values to ints, the target_year is in string format for comparison.

  1. Using csv.DictReader to read each row, check the year and add the values.
import csv

total_value = 0
target_year = "2000"

with open("data.csv", "r") as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row["years"] == target_year:
            total_value += int(row["value"])

print(total_value)
  1. Using csv.reader and sum() with a list comprehension to get the matching values.
import csv

target_year = "2000"

with open("data.csv", "r") as f:
    reader = csv.reader(f)
    next(reader)  # Skip the header row
    total_value = sum([int(row[1]) for row in reader if row[0] == target_year])

print(total_value)

If you want to check multiple years combined, you could put them all in a list and then use something like if row["years"] in target_years:

nigh_anxiety
  • 1,428
  • 2
  • 4
  • 12
0

here is a way using the loc method

I assumed that the dataframe has three columns: years, value, total_value this code sums the value of the year selected and changes it in the toal_value column in all the rows of the selected year

import pandas as pd
df = pd.read_csv("path.csv")

def total_value(year):
    for index, row in df.iterrows():
        if row["years"] == year:
            df.loc[df['years'] == year, ["total_value"]]+= row["value"]

I hope you found this helpful

oumaima
  • 5
  • 4