0

There are many questions and fixes for this but none seems to work for me. My problem is I am reading a file with strings and loading each line into DB.

In file it is looking like normal text,while in DB it is read as a unicode space. I tried replacing it with a space and similar options but none worked.

For example in text file the string will be like:

The abrupt departure

After inserted in DB, there it is looking like:

The abrupt departure

When I am trying to run query for data in DB, it is looking like:

"The abrupt\xc2\xa0departure"

I tried the following:

if "\xc2\xa0"  in str: 
     str.replace('\xa0', ' ')
     str.replace('\xc2', ' ')
     print str

the above code is printing the string like:

The abrupt departure

but while inserting back to DB, it is still the same.

Any help is appreciated.

user168983
  • 822
  • 2
  • 10
  • 27

3 Answers3

1

Try this:

This will remove Unicode character

>>> s = "The abrupt departure"
>>> s = s.decode('unicode_escape').encode('ascii','ignore')
>>> s
'The abrupt departure'

Or, You can try with replace as you have tried. But you forget to reassign to same variable.

>>> s = "The abrupt departure"
>>> s = s.replace('\xc2', '').replace('\xa0','')
>>> s
'The abrupt departure'
Harsha Biyani
  • 7,049
  • 9
  • 37
  • 61
1

The point is strings are immutable, you need to assign the return value from replace:

 s = s.replace('\xa0', ' ')
 s = s.replace('\xc2', ' ')

Also, don't use str as a variable name.

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
1

C2A0 is a "NO-BREAK SPACE". 'Â ' is what you see if your CHARATER SET settings are inconsistent.

Doing a replace() is merely masking the problem, and not helping when a different funny character comes into your table.

Since you have not provided enough info to say what you have done correctly versus incorrectly, let me point you at two references:

Community
  • 1
  • 1
Rick James
  • 135,179
  • 13
  • 127
  • 222