I'm working on taking csv files and putting them into a postgreSQL database. For one of the files though, every field is surrounded by quotes (When looking at it in Excel it looks normal. In notepad though, one row looks like "Firstname","Lastname","CellNumber","HomeNumber",etc. when it should look like Firstname,Lastname,CellNumber,HomeNumber). It breaks when I tried to load it into SQL.
I tried loading the file into python to do data cleaning, but i'm getting an error:
This is the code I'm running to load in the file in python:
import pandas as pd
logics = pd.read_csv("test.csv")
and this is the error I'm getting:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 28682: invalid continuation byte
I tried encoding it into utf-8, but that gave me a different error. code:
import pandas as pd
logics = pd.read_csv("test.csv", encoding= 'utf-8')
error:
pandas.errors.ParserError: Error tokenizing data. C error: Expected 12 fields in line 53, saw 14
For whatever reason, when I manually save the file in file explorer as UTF-8 and then save it back again as a CSV file it removes the quotation marks, but I need to automate this process. Is there any way I can use python to remove these quotation marks? Is it just some different kind of encoding?