Python pandas drop_duplicates inserts unnecessary " which lead to csv loading error

Question

in my project I am loading every other day data from Twitter an append it to a csv file. This procedure leads to exact duplicates of tweets in my csv file. That's why I want to remove these exact duplicates.

However, when I run the following code:

import pandas as pd
data = pd.read_csv("Hashtags.csv", engine="python")

data.drop_duplicates(subset=None, inplace=True)
data.to_csv("Hashtags.csv",index = False)

and then try to load the csv file I get the following error

pandas.errors.ParserError: ',' expected after '"'

Before I dropped the duplicates I had no problems with loading the file. It seems almost like the drop_duplicates function inserts unnecessary " signs. Does anyone know how to solve this problem?

Thank you very much in advance!

If you are getting this error while loading csv as you said then it should not be related with `drop_duplicates`. Is this still working when you remove the `drop_dupilcates` line? — talatccan, Dec 24 '19 at 14:00
Try this: https://stackoverflow.com/questions/55010807/pandas-errors-parsererror-expected-after/56690122 — Paulo Marcelo, Dec 24 '19 at 14:09

score 0 · Answer 1 · answered Dec 28 '19 at 13:51

I have no idea why it works now. But I changed the code on how I save the pandas Dataframe to a csv.file and now it works.

fname = 'Hashtags'
data.drop_duplicates(subset=None, inplace=True)

with open('%s.csv' % fname, 'w', encoding="utf8") as file:
    data.to_csv(file,index = False)

Python pandas drop_duplicates inserts unnecessary " which lead to csv loading error

1 Answers1