How to create a bag of words from csv file in python?

Question

I am new to python. I have a csv file which has cleaned tweets. I want to create a bag of words of these tweets. I have the following code but its not working correctly.

import pandas as pd
from sklearn import svm
from sklearn.feature_extraction.text import CountVectorizer

data = pd.read_csv(open("Twidb11.csv"), sep=' ')
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(data.Text)
count_vect.vocabulary_

Error:

.ParserError: Error tokenizing data. C error: Expected 19 fields in line 5, saw 22

Possible duplicate of [Python Pandas Error tokenizing data](https://stackoverflow.com/questions/18039057/python-pandas-error-tokenizing-data) — Vasily Bronsky, Dec 22 '17 at 06:11
It would be useful to clarify *where exactly* in your code the error occurs... — desertnaut, Dec 22 '17 at 11:54
When I am running the code now I get this error:'DataFrame' object has no attribute 'Text' — kavya sharma, Dec 24 '17 at 01:54

score 0 · Accepted Answer · answered Dec 22 '17 at 06:13

It's duplicated i think. U can see answer here. There are a lot of answers and comments.

So, solution can be:

data = pd.read_csv('Twidb11.csv', error_bad_lines=False)

Or:

df = pandas.read_csv(fileName, sep='delimiter', header=None)

"In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indeces for each field {0,1,2,...}."

When I am running the code now I get this error:'DataFrame' object has no attribute 'Text' — kavya sharma, Dec 22 '17 at 08:35

How to create a bag of words from csv file in python?

1 Answers1