1

I am new to python. I have a csv file which has cleaned tweets. I want to create a bag of words of these tweets. I have the following code but its not working correctly.

import pandas as pd
from sklearn import svm
from sklearn.feature_extraction.text import CountVectorizer

data = pd.read_csv(open("Twidb11.csv"), sep=' ')
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(data.Text)
count_vect.vocabulary_

Error:

.ParserError: Error tokenizing data. C error: Expected 19 fields in line 5, saw 22

aaa
  • 857
  • 4
  • 25
  • 46
kavya sharma
  • 43
  • 1
  • 9

1 Answers1

0

It's duplicated i think. U can see answer here. There are a lot of answers and comments.

So, solution can be:

data = pd.read_csv('Twidb11.csv', error_bad_lines=False)

Or:

df = pandas.read_csv(fileName, sep='delimiter', header=None)

"In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: "If file contains no header row, then you should explicitly pass header=None". In this instance, pandas automatically creates whole-number indeces for each field {0,1,2,...}."

Vasily Bronsky
  • 435
  • 2
  • 12