Python: Creating a function to write .CSV files

Question

I want to save word frequency lists as .CSV for several corpora. Is there a way to make Python write the filenames automatically based on the variable name? (e.g.: corpus_a > corpus_a_typefrequency.csv)

I have the following code, which already works for individual corpora:

from collections import Counter
import csv
counts = Counter(corpus_a)
    
counts = dict(sorted(counts.items(), key=lambda item: item[1],reverse=True))

with open('corpus_a_typefrequency.csv', 'w') as csv_file:  
    writer = csv.writer(csv_file)
    for key, value in counts.items():
       writer.writerow([key, value])

PS: it would be great if I could count only words (no punctuation) and also in a case-insensitive way. I haven't figured out how to do that here yet. I'm using data from the Brown Corpus as following:

import nltk
from nltk.corpus import brown
corpus_a = brown.words()

I tried brown.words().lower().isalpha(), but that doesn't work.

Where does the idea for the name "corpus_a" come from, in this example? Looking at your comment on the other answer, it seems like this code is already inside a function? If so, can you change it so instead of variable like `corpus_a`, you have `corpus` and `corpus_name`, and those two variables are set at the same time? — Zach Young, Nov 05 '21 at 17:34

score 1 · Answer 1 · edited Nov 05 '21 at 14:51

1

You should have a look at this answer: https://stackoverflow.com/a/40536047/5289234. It will allow you to extract the variable name and use it to save the csv.

import inspect


def retrieve_name(var):
        """
        Gets the name of var. Does it from the out most frame inner-wards.
        :param var: variable to get name from.
        :return: string
        """
        for fi in reversed(inspect.stack()):
            names = [var_name for var_name, var_val in fi.frame.f_locals.items() if var_val is var]
            if len(names) > 0:
                return names[0]

from collections import Counter
import csv
counts = Counter(corpus_a)
    
counts = dict(sorted(counts.items(), key=lambda item: item[1],reverse=True))

with open(retrieve_name(corpus_a) +'_typefrequency.csv', 'w') as csv_file:  
    writer = csv.writer(csv_file)
    for key, value in counts.items():
    writer.writerow([key, value])

edited Nov 05 '21 at 14:51

erip

16,374
11
66
121

answered Nov 05 '21 at 14:45

GbjC

34
4

Thanks for this! I'm still a bit confused about the syntax though. "corpus_a" is inside the function, so how I would use it for e.g. "corpus_b"? (sorry, I'm a beginner) – matinier Nov 05 '21 at 15:30
Instead of copying another answer, you should flag the question as a duplicate if you believe it is already answered elsewhere – Tomerikoo Nov 05 '21 at 15:39
As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 05 '21 at 16:17
@Tomerikoo, I am not sure what you really want to do. Are you trying to automatically create a number of corpus from differents sources and exporting a single CSV for earch corpus ? Maybe you could use an Enum and enumerate over all of them link to the standard library [Enum](https://docs.python.org/3/library/enum.html). – GbjC Nov 05 '21 at 16:38
Mmm this is not my question. If you don't understand the question how could you answer it? – Tomerikoo Nov 05 '21 at 16:43

Python: Creating a function to write .CSV files

1 Answers1