0

I have a list that returns frequently used words in a .txt. How can I send those to a CSV (or other Excel file) where one column has the word, and the other has the frequency.

For example, here's the start of my Counter(my_list) returned values:

Counter({'the': 3317, 'to': 1845, 'and': 1812, 'a': 1580, '': 1248, 'of': 1248, 'Harry': 1213, 'was': 1179, 'he': 1034, 'in': 932, 'his': 895, 'it': 803, 'said': 793, ...

I'd like each word to be in a column, say A, and the count, in B. like

the | 3317
to  | 1845
and | 1812
a   | 1580

Etc. (Note that it can be sorted alphabetically in the CSV. I'm just trying to get it in there to analyse).

Here's what I have now:

def create_csv(my_list):
    with open(r'myCSV.csv', 'w', newline='') as my_CSV:
        fieldnames = ['word','count']
        writer = csv.writer(my_CSV)
        writer.writerow(fieldnames)
        for key, value in my_list.items():
            writer.writerow(list(key) + [value])

This almost works, except each letter is in a column, followed by the count:

enter image description here

What do I need to change so that the word stays together?

edit: to be sure, here's the function I'm using to create the list. (my_file is a .txt file)

def unique_words():
    with open(my_file, encoding="utf8") as infile:
        for line in infile:
            words = line.split()
            for word in words:
                edited_word = clean_word(word)
                lst.append(edited_word)
                if edited_word not in lst:
                    lst.append(edited_word)     
    lst.sort()  
    return lst, cnt

and calling it via:

create_csv(Counter(lst))
Ivan Kolesnikov
  • 1,787
  • 1
  • 29
  • 45
BruceWayne
  • 22,923
  • 15
  • 65
  • 110
  • 1
    To see what's really going on here, we should look at the raw CSV file, not the Excel output. You can open a csv in Notepad (or preferably a code editor, like Notepad++). – jpaugh Aug 11 '17 at 14:42
  • @jpaugh oh hmm, good point. I'll take a look when I'm back at the computer. Thanks for the idea! – BruceWayne Aug 11 '17 at 14:43
  • 1
    Never trust an application which is more than a few megabytes in size: whatever it does is really, really complicated. ;-) – jpaugh Aug 11 '17 at 15:02

1 Answers1

2

Don't do list(key). Directly putting key should work. Now, assuming words are on one line and space separated,

def Counter(my_file):
    count = {}
    with open(my_file, encoding="utf-8") as infile:
        for line in infile:
            words = line.strip().split()
            for word in words:
               #Assuming clean_word is a function to get rid of full stops, commas etc.
               edited_word = clean_word(word)
               count[edited_word] = count.get(edited_word, 0) + 1
    return count

def create_csv(my_list):
    with open(r'myCSV.csv', 'w', newline='') as my_CSV:
        fieldnames = ['word','count']
        writer = csv.writer(my_CSV)
        writer.writerow(fieldnames)
        for key, value in count.items():
            writer.writerow([key, str(value)])
N M
  • 596
  • 4
  • 18
  • Doing that, I get an error: `TypeError: must be str, not list`. Edit: If I do `writer.writerow(str(key) + str([value]))` I get (each character in its own cell, same row), `A L B U S [ 3 ]`, etc. – BruceWayne Aug 11 '17 at 05:40
  • Try opening the file with open(r'myFile.csv','w') – N M Aug 11 '17 at 05:45
  • Nothing changes, still getting each character in its own cell :/ . ...would it help to post the function that I'm using to actually create the list? Edit: Note, when I do what you suggest in the code provided, an error occurs `name 'count' is not defined`. – BruceWayne Aug 11 '17 at 05:48
  • Sorry, count was a typo. It should be my_list. Also, how are the words stored in the file? Are they on separate lines or on one line and space separated? – N M Aug 11 '17 at 06:15
  • Yeah, I figured. I tried that and still get the separate letter per cell. In the .txt, the words are on multiple lines. For example you can use the [Simple Sabotage Field Manual](https://www.gutenberg.org/cache/epub/26184/pg26184.txt) – BruceWayne Aug 11 '17 at 06:20
  • Answer changed to answer comment. – N M Aug 11 '17 at 06:25
  • Hmm - Thanks for your continued help! But now it doesn't seem to be returning anything? I have at the very top (before all `def`), `myFile = "Manual.txt"`, and `count = {}` (to make it global, yeah?) -- but running it via `Counter(myFile)` then `create_csv(count)` I get nothing in the CSV... – BruceWayne Aug 11 '17 at 06:36
  • Instead of declaring count, just do count = Counter(myFile). Otherwise, you'll have to add the line global count inside the function. Refer to [global keyword](https://stackoverflow.com/questions/4693120/use-of-global-keyword-in-python) – N M Aug 11 '17 at 06:42
  • 1
    `str(key) + str([value])` is wrong, because it combines two strings into one big string. What you need, is a list of *columns* (in this case, a list of strings). This should work: `writer.writerow([key, value])`. Note that this exactly matches the OP's list of `fieldnames`, which is already working. – jpaugh Aug 11 '17 at 14:52
  • @BruceWayne Forgot to ping you. I think my previous comment (here) fixes your issue. – jpaugh Aug 11 '17 at 15:01
  • @jpaugh - Aha! I used `writer.writerow([key, str(value)]` and it worked! I'm getting the whole word in column A, and the count in Column B. This, with NM's excellent `def Counter()` function, solved it! :D Thanks everyone!!! (Quick note, I had to tab the `for word in words:` to be under `word =`). – BruceWayne Aug 11 '17 at 23:01
  • NM, since your answer has been accepted, can you update it to include the changes BruceWayne mentioned in the above comment? – jpaugh Aug 14 '17 at 15:40
  • @BruceWayne Glad to hear it! Can I pop by your mansion in Gotham? I've got this great business idea, involving multi-level marketing... – jpaugh Aug 14 '17 at 15:41
  • @jpaugh hmmm how many levels? Also, as a fan of geometry, I hope it's a triangularly shaped business model. – BruceWayne Aug 14 '17 at 16:26