-2

i have this script, it reads file(file consists of collected tweets), cleans it, gets frequency distribution and creates plot, but now i can work only with one file, what i need is to create function from it, to be able to pass more files. So i can create dataframe with results of freqdist from more files to plot it

f = open(.......)
text = f.read()
text = text.lower()
for p in list(punctuation):
    text = (text.replace(p, ''))

allWords = nltk.tokenize.word_tokenize(text)
allWordDist = nltk.FreqDist(w.lower() for w in allWords)
stopwords = set(stopwords.words('english'))

allWordExceptStopDist = nltk.FreqDist(w.lower() for w in allWords if w not in stopwords)
mostCommon = allWordExceptStopDist.most_common(25)

frame = pd.DataFrame(mostCommon, columns=['word', 'frequency'])
frame.set_index('word', inplace=True)
print(frame)
histog = frame.plot(kind='barh')
plt.show()

thank you very much for any help!

Daenyth
  • 35,856
  • 13
  • 85
  • 124
  • 3
    So you're asking "how do I make a function"? [Here you go](https://docs.python.org/3/tutorial/controlflow.html#defining-functions). – Kevin Jun 09 '16 at 13:57
  • basically yes, i somehow cant figure out how to write it in function – Khrystyna Kosenko Jun 09 '16 at 13:59
  • so your problem is to write a function in python, it is nothing to do with file read,dataframe or plot. – Eular Jun 09 '16 at 14:08
  • Possible duplicate of [Basic explanation of python functions](http://stackoverflow.com/questions/32409802/basic-explanation-of-python-functions) – Amin Alaee Jun 09 '16 at 14:50

1 Answers1

-1

Is this what you meant?

def readStuff( filename )
    with open(filename) as f:
        text = f.read()
    text = text.lower()
    for p in list(punctuation):
        text = (text.replace(p, ''))

    allWords = nltk.tokenize.word_tokenize(text)
    allWordDist = nltk.FreqDist(w.lower() for w in allWords)
    stopwords = set(stopwords.words('english'))

    allWordExceptStopDist = nltk.FreqDist(w.lower() for w in allWords if w not in stopwords)
    mostCommon = allWordExceptStopDist.most_common(25)

    frame = pd.DataFrame(mostCommon, columns=['word', 'frequency'])
    frame.set_index('word', inplace=True)
    print(frame)
    histog = frame.plot(kind='barh')
    plt.show()
Daenyth
  • 35,856
  • 13
  • 85
  • 124
Brian
  • 1,659
  • 12
  • 17