-3

I have a school assignment that is asking me to write a program that first reads in the name of an input file and then reads the file using the csv.reader() method. The file contains a list of words separated by commas. The program should output the words and their frequencies (the number of times each word appears in the file) without any duplicates. I have been able to figure out how to do this somewhat for one specific input file, but the program needs to be able to read multiple input files. This is what I have so far:

with open('input1.csv', 'r') as input1file:

    csv_reader = csv.reader(input1file, delimiter = ',')
    for row in csv_reader:
        new_row = set(row)
        
    for m in new_row:
        count = row.count(m)
        print(m, count)

This is what I get:

woman 1
man 2
Cat 1
Hello 1
boy 2
cat 2
dog 2
hey 2
hello 1 

This works (almost) for the input1 file, except it changes the order each time I run it. And I need it to work for two other input files?

sample CSV

hello,cat,man,hey,dog,boy,Hello,man,cat,woman,dog,Cat,hey,boy
Edo Akse
  • 4,051
  • 2
  • 10
  • 21
  • 1
    make a function out of it to apply it to multiple files. Also, what order changes? It also helps if you provide some sample input... – Edo Akse Apr 16 '22 at 14:45
  • Thank you..Edo Akse, I am new to Python programming so I know the word function and I have done a function before but I am not clear as to how to apply it to this problem. and the sample input was (just one of the csv files : hello,cat,man,hey,dog,boy,Hello,man,cat,woman,dog,Cat,hey,boy. Does that help? – Anecia Chavis-Puller Apr 16 '22 at 17:50

1 Answers1

0

See the code below for an example, I've commented it so you understand what it does and why.

As for the fact that for your implementation the order is different is due to the usage of set. A set by definition is unordered.

Also note that with your implementation you are passing over the rows twice, once to turn it into a set, and once more to count. Besides this, if the file contains more than one row, your logic would fail, as the counting part only gets reached when the last line of the file is read.

import csv


def count_things(filename):
    with open(filename) as infile:
        csv_reader = csv.reader(infile, delimiter = ',')
        result = {}
        for row in csv_reader:
            # go over the row by element
            for element in row:
                # does it exist already?
                if element in result:
                    # if yes, increase count
                    result[element] += 1
                else:
                    # if no, add and set count to 1
                    result[element] = 1

    # sorting, explained in detail here:
    # https://stackoverflow.com/a/613218/9267296
    return {k: v for k, v in sorted(result.items(), key=lambda item: item[1], reverse=True)}
    # you could just return unsorted result by using:
    # return result


for key, value in count_things("input1.csv").items():
    # iterate over items() by key/value pairs
    # see this link:
    # https://www.w3schools.com/python/python_dictionaries_access.asp
    print(key, value)
Edo Akse
  • 4,051
  • 2
  • 10
  • 21