0

I have df from a csv file, and count the words in one column ('short_description'):

df = pd.read_csv(r'C:\Users\username\Downloads\file_to_analyse.csv')
#lists and counts all  words from short description
word_counter = df.short_description.str.split(expand=True).stack().value_counts().to_string()

This gives me a all words and a number how often it is in this column. But to show it visual in a graphic (with pandas) I need to separate words = [] amount = []

Then I can create x-axis and y-axis. But actually its one big ...string? I think so.

I tried:

#separate numbers
amount = [int(s) for s in word_counter.split() if s.isdigit()]

#separate words
word_list = []
separate_words = ''.join([i for i in word_counter if not i.isdigit()])
word_list.append(separate_words)

This word_list is not giving me single words but a big string with many spaces

  • Welcome to Stack Overflow! Since we do not have access to the `csv` file that is stored on your machine. Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888) for more information. – Henry Ecker Jun 07 '21 at 15:18

1 Answers1

0

I think you could do this using a dictionary

word_counter = df.short_description.str.split(expand=True).stack().value_counts().to_dict()


#separate numbers
amount = list(word_counter.values())

#separate words
word_list = list(word_counter.keys())
falafelocelot
  • 548
  • 3
  • 12