0

I read the symbol °C using xlrd library. I get the unicode value as u'\xb0C'. However I want to use it as a normal string.

I went through a couple of posts including the below link

Convert a Unicode string to a string in Python (containing extra symbols)

It seems to be working for many special signals. but in this case I am seeing only C that is without ° (degree). any help would be much appreciated

Community
  • 1
  • 1
karpanai
  • 235
  • 4
  • 14
  • `unicodedata.normalize('NFKD', u'ºC').encode('ascii', 'ignore')` returns `'oC'`, the closest ASCII representation. Is this not what you want? – Jon Gauthier Dec 13 '12 at 14:30
  • as i mentioned above I am reading the value using xlrd, so I am getting only u'\xb0C. is there any possibility to read in different format? – karpanai Dec 13 '12 at 14:42
  • No. `xlrd` docs say specifically that they return all data in Python unicode. You can, however, convert the unicode objects returned to another encoding (UTF-8 rather than ASCII, for instance), as already described in the answers. – Silas Ray Dec 13 '12 at 14:49

2 Answers2

3

Maybe I don't understand something, but:

>>> print u'\xb0C'.encode("UTF-8")
°C
  • In this case print converts the unicode string.however in my case, i need to write °C in another file and generate pdf – karpanai Dec 13 '12 at 14:40
  • As long as the "other file" and the pdf generation tool you are using support UTF-8 (which they probably do), just write the UTF-8 encoded string, using the `encode()` method presented in the is answer, to wherever it needs to go. What libraries are you using for this conversion process, and what file formats are you trying to write to? – Silas Ray Dec 13 '12 at 14:43
  • I was looking for this when reading special accent characters on excel using xlrd – Nwawel A Iroume Sep 29 '17 at 08:08
2

If by "normal string" you mean ASCII encoded string, then you can't do exactly what you want. The degree symbol is not part of the ASCII character set, so the best you can hope to do is either drop it or convert it to a best approximation character from the ASCII character set. You could choose a different encoding, however you have to be sure that whatever systems you are interacting with will work with the encoding you choose. UTF-8 is usually a safe bet, and can encode pretty much any character you'll ever likely run in to.

Silas Ray
  • 25,682
  • 5
  • 48
  • 63