2

I have a function (main) that takes data from a csv file and converts it into a dictionary whose keys are the entries in the first column and their values are a list of all the other entries in that row (eg: one row is: 2020-12-20,0,0,0,0,206, so the key is 2020-12-20 and the rest of the entries are strings in a list: ['0', '0', '0', '0', '206']):

def main():
    import csv
    # doses_data_mar_20.csv
    dict_doses_by_date = {}

    filename_input = str(input("Please enter a .csv file to read: "))
    with open(filename_input, "r") as inp, open('doses.csv', 'w') as out:
        header = inp.readline()
        reader = csv.reader(inp, delimiter=",", quotechar='"')
        for line in reader:
            dict_doses_by_date[line[0]] = line[1:6]
    return dict_doses_by_date

def count_doses_by_date(dict_dose_by_date):

now I need to define a new function count_doses_by_date that takes each list of strings as an input and converts each of these lists of strings into a list of integers and add all the integers to get their totals. then outputs this into another csv file.

I tried doing this:

def count_doses_by_date(dict_dose_by_date):
    import csv
    # doses_data_mar_20.csv
    dict_doses_by_date = {}
    filename_input = str(input("Please enter a .csv file to read: "))
    with open(filename_input, "r") as inp, open('doses.csv', 'w') as out:
        header = inp.readline()
        reader = csv.reader(inp, delimiter=",", quotechar='"')
        for line in reader:
            dict_doses_by_date[line[0]] = line[1:6]
        for k in dict_doses_by_date:
            list_integers = [int(x) for x in dict_doses_by_date[k]]
            sum_integers = sum(list_integers)
            print_value = "{}, {} \n".format(k, sum_integers)
    return out.write(print_value)

but I’m getting errors since some of the lists contain strings like '1,800' which contain commas that prevent it from be converted to an integer. I don't know how to get rid of there's thousands commas without disrupting the commas that separate the csv values.

I'm stuck.. how would this be done?

  • Take a look at [`map`](https://docs.python.org/3/library/functions.html#map), and try something like `map(int, yourlist)` – Nick Jun 13 '22 at 02:09
  • `map` returns a map object, you'll want something more like `list(map(int,yourlist))` – Mous Jun 13 '22 at 02:10
  • Does this answer your question? [Convert all strings in a list to int](https://stackoverflow.com/questions/7368789/convert-all-strings-in-a-list-to-int) – Nick Jun 13 '22 at 02:10
  • 2
    @Mous you can `sum` a `map` without converting to a list... – Nick Jun 13 '22 at 02:11
  • Ah, I missed that part. Thanks. – Mous Jun 13 '22 at 02:12
  • Welcome to Stack Overflow. "now I need to define a new function count_doses_by_date that takes each list of strings as an input and converts each of these lists of strings into a list of integers and add all the integers to get their totals. then outputs this into another csv file." Okay; so, in other words, it **should not** ask for the name of a CSV file, or try to read the CSV file, or try to create a dictionary. Instead, it should *use* the dictionary that is *being passed to it as a parameter*. – Karl Knechtel Jun 13 '22 at 02:29
  • "I don't know how to get rid of there's thousands commas without disrupting the commas that separate the csv values." You don't have to worry about this, because the job of interpreting the CSV data and separating the values **was already done**. That's why you have a *list of strings*, not a single string with extra commas in it. – Karl Knechtel Jun 13 '22 at 02:30
  • Please, take a look at https://stackoverflow.com/questions/1779288/how-to-convert-a-string-to-a-number-if-it-has-commas-in-it-as-thousands-separato. – Ignatius Reilly Jun 13 '22 at 02:35
  • `list_integers = [int(x.replace(',','')) for x in dict_doses_by_date[k]]` should do it. – Ignatius Reilly Jun 13 '22 at 02:37

3 Answers3

0

So, if your string is something like "1234" you can do

int(number, base=base)

And you will obtain an integer. So for example:

print(int("1234"))

Will print the 1234 number.

Please check the rest of documentation here: https://docs.python.org/3/library/functions.html#int

Then to actually achieve what you want you can proceed as suggested on the other comments or any way you would like, just loop through the list of elements and keep adding them (a+= int("1234")) then return the total and write it to the file.

Of course, if your strings have unexpected symbols such as "thousands commas" then you need to normalize strings before calling int() by removing the character with replace() or by other means.

Lithe
  • 21
  • 2
  • 1
    what about strings that have thousands commas ? – eshtabel3asal Jun 13 '22 at 02:16
  • @eshtabel3asal as part of your process and right before you add up numbers you will have to normalize the whole set. For example, you can simply remove any ',' character from the string. if your number is n="1,300" then do total+=int(n.replace(',')) – Lithe Jun 13 '22 at 02:20
  • @Lithe would this work `sum_integers = sum(list_integers.replace(","))`? – eshtabel3asal Jun 13 '22 at 02:31
  • `.replace()` requires 2 arguments, for example `.replace(",", "")` – blackraven Jun 13 '22 at 04:49
  • @perpetualstudent totally, my apologies. I´ll avoid posting in the middle of the night I guess. – Lithe Jun 14 '22 at 18:18
0

Would you try this? Use string.isdigit() to determine whether it is a number or not

line = ['2020-12-20', '0', '0', '0', '0', '206']
filtered_line = [int(e) if e.isdigit() else '' for e in line[1:6]]
print([x for x in filtered_line if x != ''])

Output

[0, 0, 0, 0, 206]

Edit: I missed the part about thousand separator. In your use case, the code could be this:

dict_doses_by_date = {}
reader = [['2020-12-20', '0', '0', '0', '10', '206'], ['2020-12-21', '0', '0', '0', '20', '316'], ['2020-12-22', '0', '0', '0', '30', '1,426']]

for line in reader:
    list_integers = [int(x.replace(',', '')) for x in line[1:6]]
    dict_doses_by_date[line[0]] = list_integers
    print_value = "{}, {} \n".format(line[0], sum(list_integers))
    print(print_value)

print(dict_doses_by_date)

Output

2020-12-20, 216

2020-12-21, 336

2020-12-22, 1456

{'2020-12-20': [0, 0, 0, 10, 206], '2020-12-21': [0, 0, 0, 20, 316], '2020-12-22': [0, 0, 0, 30, 1426]}
blackraven
  • 5,284
  • 7
  • 19
  • 45
0

You should use the pandas library. You can use pd.read_csv to get a dataframe directly from the file, and you can set the first column to the index column. You can use df.applymap(lamba x : int(x.replace(',','')) to get rid of the commas and convert to int, then do df.sum(axis = 1) to get a row-by-row sum.

Acccumulation
  • 3,491
  • 1
  • 8
  • 12