i have this script, it reads file(file consists of collected tweets), cleans it, gets frequency distribution and creates plot, but now i can work only with one file, what i need is to create function from it, to be able to pass more files. So i can create dataframe with results of freqdist from more files to plot it
f = open(.......)
text = f.read()
text = text.lower()
for p in list(punctuation):
text = (text.replace(p, ''))
allWords = nltk.tokenize.word_tokenize(text)
allWordDist = nltk.FreqDist(w.lower() for w in allWords)
stopwords = set(stopwords.words('english'))
allWordExceptStopDist = nltk.FreqDist(w.lower() for w in allWords if w not in stopwords)
mostCommon = allWordExceptStopDist.most_common(25)
frame = pd.DataFrame(mostCommon, columns=['word', 'frequency'])
frame.set_index('word', inplace=True)
print(frame)
histog = frame.plot(kind='barh')
plt.show()
thank you very much for any help!