I'm trying to get a .csv
file, filter some columns and create a new MySQL table with the new filtered df.
The issue is that I have one particular value within that .csv
with a different char set, as you can see below ('\ufeffY'
):
df['Supported'].unique()
array(['Y', '\ufeffY'], dtype=object)
After that, I created a new df
filtered with the columns renamed. I'm trying normalize those values. First, I'm removing unwanted possible spaces and then converting it to lower case (this column's value from the .CSV file should have in theory only "y" or "n")
df2['supported'] = df2['supported'].str.strip()
df2['supported'] = df2['supported'].str.lower()
Then, I'm trying to set to "?"
anything that is different from "y" or "n", which I believe would fix that '\ufeffY'
value from the original df.
df2.loc[~df2['supported'].isin(['y', 'n']), 'supported'] = '?'
If I check df2['supported'].unique()
after the operation above, I got this:
array(['y', '?'], dtype=object)
I assumed that the '?'
would be, maybe among other values, the '\ufeffY'
that I'm trying to get rid of, and then I'm trying to use that df2
with sqlalchemy
to create a new table by this:
df2.to_sql('ServersList', engine, index=False, if_exists='replace',
dtype={'supported': sqlalchemy.types.Text,
'server_name': sqlalchemy.types.Text,
'owner': sqlalchemy.types.Text,
'critical': sqlalchemy.types.Text,
'business_function': sqlalchemy.types.Text,
'IP': sqlalchemy.types.Text,
'activated': sqlalchemy.types.DateTime(),
'service_type': sqlalchemy.types.Text,
'platform': sqlalchemy.types.Text,
'OS': sqlalchemy.types.Text,
'classification': sqlalchemy.types.Text,
'support_team': sqlalchemy.types.Text,
'virtual_server': sqlalchemy.types.Text,
'host_name': sqlalchemy.types.Text,
'location': sqlalchemy.types.Text,
'log': sqlalchemy.types.DateTime()}
)
And that is when I got the following error:
UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 10: character maps to <undefined>
The message is pretty clear, but I thought that df2 would not have that value anymore and therefore it should work.
Obviously I'm missing something here.