0

My program needs a function that reads data from a csv file ("all.csv") and extracts all the data pertaining to a state on a specific date (extract each row that has 'state name' and 'date' in it), then writes the extracted data to another csv file named: state + ".csv"

While the data is being written, the number of cases and deaths for each state on that specific date is counted and totaled. Then the function returns total cases and deaths as a tuple (cases,deaths)

ex. state = 'California' date = '2020-03-09'

The error I get is that '0.0' and 'deaths' cannot be converted to an int. The first row is the header, and I get the error that 'deaths cannot be converted to an int. So I have two questions:

  1. How can I skip the header 'deaths' (last column) and move on to the the rest of the data?
  2. How can I convert the rest of the data (a string in decimal format) to an int?

enter image description here

Note: When I saved the link data to 'all.csv' the deaths column converted to decimal format (0.0).

Here is the contents of 'all.csv': https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv

This is a snippet of 'all.csv': enter image description here **note that there are 7 columns in 'all.csv' as opposed to 6 columns in the csv file hyperlink

Here is the program I have written:

import csv

input_file = 'all.csv'
state = input()
date = input() # date format m/d/yyyy
output_file = state + '.csv'


def number_of_cases_deaths_by_date(input_file, output_file, state, date):
    with open(input_file, 'r') as infile: #open both files
        contents = infile.readlines()
        
        with open(output_file, 'w') as outfile:
            writer = csv.writer(outfile)
        
            for row in range(len(contents)): # save data in list
                contents[row] = contents[row].split(',') #split elements
                contents[row][6] = contents[row][6].strip('\n') #strip \n from last column
                
            print(contents[3:5])
            cases = 0
            deaths = 0
            
            for row in range(len(contents)):
                if contents[row][3] == state and contents[row][1] == date: # if row has desired state, write it to new file
                    writer.writerow((contents[row]))
                    int_cases = int(contents[row][5])
                    cases = cases + int_cases
                    int_deaths = int(contents[row][6])
                    deaths += deaths + int_deaths
            return (cases, deaths)
                    
                
data = number_of_cases_deaths_by_date(input_file, output_file, state, date)
print(data)
Geraldo
  • 31
  • 5
  • 2
    [Please do not post text as images](https://meta.stackoverflow.com/q/285551). Copy and paste the text into your question and use the code formatting tool to format it correctly. Images are not searchable, and can not be interpreted by screen readers for those with visual impairments. Use the [edit](https://stackoverflow.com/posts/69957200/edit) link to modify your question. – Mark Tolonen Nov 13 '21 at 18:47
  • I think you can use pandas, numpy libraries to play with CSV data – user11823877 Nov 13 '21 at 18:50
  • "How can I convert the rest of the data (a string in decimal format) to an int?" In your own words, what do you think an `int` is? In your own words, why do you think it should be possible to convert "a string in decimal format" to one, and what do you think should be the result? – Karl Knechtel Nov 13 '21 at 19:08
  • "How can I skip the header 'deaths' (last column) and move on to the the rest of the data?" Did you try reading the documentation for the csv module? How about putting `python csv skip header` [into a search engine](https://duckduckgo.com/?q=python+csv+skip+header)? – Karl Knechtel Nov 13 '21 at 19:09

1 Answers1

-1

It wasn't stated why all.csv data is different from your link, but cleaning up your code and using the original download data this is straightforward:

import csv

def number_of_cases_deaths_by_date(input_file, output_file, desired_state, desired_date):

    total_cases = 0
    total_deaths = 0

    with open(input_file, encoding='utf8', newline='') as infile, \
         open(output_file, 'w', encoding='utf8', newline='') as outfile:

        reader = csv.reader(infile)
        writer = csv.writer(outfile)
        header = next(reader)
        
        for date,county,state,fips,cases,deaths in reader:
            if state == desired_state and date == desired_date:
                writer.writerow([date,county,state,fips,cases,deaths])
                total_cases += int(cases)
                total_deaths += int(deaths)

    return total_cases, total_deaths
                
input_file = 'us-counties.csv'
state = 'Oregon'
date = '2021-11-05'
output_file = f'{state}.csv'

data = number_of_cases_deaths_by_date(input_file, output_file, state, date)
print(data)

Output:

(372137, 4562)

Oregon.csv:

2021-11-05,Baker,Oregon,41001,2079,30
2021-11-05,Benton,Oregon,41003,5779,31
2021-11-05,Clackamas,Oregon,41005,31022,328
2021-11-05,Clatsop,Oregon,41007,2491,29
2021-11-05,Columbia,Oregon,41009,4024,47
2021-11-05,Coos,Oregon,41011,5294,98
2021-11-05,Crook,Oregon,41013,3114,53
2021-11-05,Curry,Oregon,41015,1846,27
2021-11-05,Deschutes,Oregon,41017,21677,138
2021-11-05,Douglas,Oregon,41019,12622,263
2021-11-05,Gilliam,Oregon,41021,168,4
2021-11-05,Grant,Oregon,41023,1039,14
2021-11-05,Harney,Oregon,41025,1172,30
2021-11-05,Hood River,Oregon,41027,2007,37
2021-11-05,Jackson,Oregon,41029,23807,330
2021-11-05,Jefferson,Oregon,41031,3972,60
2021-11-05,Josephine,Oregon,41033,9712,206
2021-11-05,Klamath,Oregon,41035,8552,127
2021-11-05,Lake,Oregon,41037,987,15
2021-11-05,Lane,Oregon,41039,28799,323
2021-11-05,Lincoln,Oregon,41041,3352,45
2021-11-05,Linn,Oregon,41043,13711,141
2021-11-05,Malheur,Oregon,41045,5814,82
2021-11-05,Marion,Oregon,41047,38270,472
2021-11-05,Morrow,Oregon,41049,1897,24
2021-11-05,Multnomah,Oregon,41051,57661,746
2021-11-05,Polk,Oregon,41053,7737,87
2021-11-05,Sherman,Oregon,41055,175,3
2021-11-05,Tillamook,Oregon,41057,2045,38
2021-11-05,Umatilla,Oregon,41059,14820,164
2021-11-05,Union,Oregon,41061,3294,50
2021-11-05,Wallowa,Oregon,41063,729,12
2021-11-05,Wasco,Oregon,41065,3033,41
2021-11-05,Washington,Oregon,41067,40121,342
2021-11-05,Wheeler,Oregon,41069,108,1
2021-11-05,Yamhill,Oregon,41071,9207,124
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • OP asked two questions, both of which are ready duplicates and also easy to research. The only thing that was good about the question is that it provided adequate information to be answered. Please do not attempt answers like this, especially without any explanation. Stack Overflow is not a code-writing service. – Karl Knechtel Nov 13 '21 at 19:12