-1

I want to save word frequency lists as .CSV for several corpora. Is there a way to make Python write the filenames automatically based on the variable name? (e.g.: corpus_a > corpus_a_typefrequency.csv)

I have the following code, which already works for individual corpora:

from collections import Counter
import csv
counts = Counter(corpus_a)
    
counts = dict(sorted(counts.items(), key=lambda item: item[1],reverse=True))

with open('corpus_a_typefrequency.csv', 'w') as csv_file:  
    writer = csv.writer(csv_file)
    for key, value in counts.items():
       writer.writerow([key, value])

PS: it would be great if I could count only words (no punctuation) and also in a case-insensitive way. I haven't figured out how to do that here yet. I'm using data from the Brown Corpus as following:

import nltk
from nltk.corpus import brown
corpus_a = brown.words()

I tried brown.words().lower().isalpha(), but that doesn't work.

matinier
  • 9
  • 2
  • Where does the idea for the name "corpus_a" come from, in this example? Looking at your comment on the other answer, it seems like this code is already inside a function? If so, can you change it so instead of variable like `corpus_a`, you have `corpus` and `corpus_name`, and those two variables are set at the same time? – Zach Young Nov 05 '21 at 17:34

1 Answers1

1

You should have a look at this answer: https://stackoverflow.com/a/40536047/5289234. It will allow you to extract the variable name and use it to save the csv.

import inspect


def retrieve_name(var):
        """
        Gets the name of var. Does it from the out most frame inner-wards.
        :param var: variable to get name from.
        :return: string
        """
        for fi in reversed(inspect.stack()):
            names = [var_name for var_name, var_val in fi.frame.f_locals.items() if var_val is var]
            if len(names) > 0:
                return names[0]

from collections import Counter
import csv
counts = Counter(corpus_a)
    
counts = dict(sorted(counts.items(), key=lambda item: item[1],reverse=True))

with open(retrieve_name(corpus_a) +'_typefrequency.csv', 'w') as csv_file:  
    writer = csv.writer(csv_file)
    for key, value in counts.items():
    writer.writerow([key, value])
erip
  • 16,374
  • 11
  • 66
  • 121
GbjC
  • 34
  • 4
  • Thanks for this! I'm still a bit confused about the syntax though. "corpus_a" is inside the function, so how I would use it for e.g. "corpus_b"? (sorry, I'm a beginner) – matinier Nov 05 '21 at 15:30
  • Instead of copying another answer, you should flag the question as a duplicate if you believe it is already answered elsewhere – Tomerikoo Nov 05 '21 at 15:39
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 05 '21 at 16:17
  • @Tomerikoo, I am not sure what you really want to do. Are you trying to automatically create a number of corpus from differents sources and exporting a single CSV for earch corpus ? Maybe you could use an Enum and enumerate over all of them link to the standard library [Enum](https://docs.python.org/3/library/enum.html). – GbjC Nov 05 '21 at 16:38
  • Mmm this is not my question. If you don't understand the question how could you answer it? – Tomerikoo Nov 05 '21 at 16:43