I am working on creating a bag of words. I referred to this link https://pythonprogramminglanguage.com/bag-of-words/#respond
df = pd.read_csv('Twidb11.csv',error_bad_lines=False, sep='delimiter', engine='python')
# Creating Bag of Words
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(df.Text)
print count_vect.fit_transform(df.Text).todense()
#X_train_counts.shape
print count_vect.vocabulary_
It is giving me the words and their frequency but the words are not ordered in alphabetical order and u' symbol is there, as shown below. How to get rid of this?
Output : { u'binance': 28, u'they': 139, u'just': 83, u'global': 67, u'alternatives': 11, u'zcash': 168, u'years': 165, u'talks': 133, u'japan': 82, u'yes': 166, u'25': 1, u'chinese': 37, u'6000': 5, u'zzzpositive': 170, u'winner': 162, u'28': 2, u'actually':12 ....}