0

In this first example we save two Unicode strings in a file while delegating to codecs the task of encoding them.

# -*- coding: utf-8 -*-
import codecs
cities = [u'Düsseldorf', u'天津市']
with codecs.open("cities", "w", "utf-8") as f:
    for c in cities:
        f.write(c)

We now do the same thing, first saving the two names to redis, then reading them back and saving what we've read to a file. Because what we've read is already in utf-8 we skip decoding/encoding for that part.

# -*- coding: utf-8 -*-
import redis
r_server = redis.Redis('localhost') #, decode_responses = True)
cities_tag = u'Städte'
cities = [u'Düsseldorf', u'天津市']
for city in cities:
    r_server.sadd(cities_tag.encode('utf8'),
                  city.encode('utf8'))

with open(u'someCities.txt', 'w') as f:
    while r_server.scard(cities_tag.encode('utf8')) != 0:
        city_utf8 = r_server.srandmember(cities_tag.encode('utf8'))
        f.write(city_utf8)
        r_server.srem(cities_tag.encode('utf8'), city_utf8)

How can I replace the line

r_server = redis.Redis('localhost')

with

r_server = redis.Redis('localhost', decode_responses = True)

to avoid the wholesale introduction of .encode/.decode when using redis?

Calaf
  • 10,113
  • 15
  • 57
  • 120
  • unrelated: [use `io.open()` instead of `codecs.open()`](https://www.python.org/dev/peps/pep-0400/#abstract) – jfs Mar 02 '16 at 11:55

1 Answers1

0

I'm not sure that there is a problem.

If you remove all of the .encode('utf8') calls in your code it produces a correct file, i.e. the file is the same as the one produced by your current code.

>>> r_server = redis.Redis('localhost')
>>> r_server.keys()
[]
>>> r_server.sadd(u'Hauptstädte', u'東京', u'Godthåb',u'Москва')
3
>>> r_server.keys()
['Hauptst\xc3\xa4dte']
>>> r_server.smembers(u'Hauptstädte')
set(['Godth\xc3\xa5b', '\xd0\x9c\xd0\xbe\xd1\x81\xd0\xba\xd0\xb2\xd0\xb0', '\xe6\x9d\xb1\xe4\xba\xac'])

This shows that keys and values are UTF8 encoded, therefore .encode('utf8') is not required. The default encoding for the redis module is UTF8. This can be changed by passing an encoding when creating the client, e.g. redis.Redis('localhost', encoding='iso-8859-1'), but there's no reason to.

If you enable response decoding with decode_responses=True then the responses will be converted to unicode using the client connection's encoding. This just means that you don't need to explicitly decode the returned data, redis will do it for you and give you back a unicode string:

>>> r_server = redis.Redis('localhost', decode_responses=True)
>>> r_server.keys()
[u'Hauptst\xe4dte']
>>> r_server.smembers(u'Hauptstädte')
set([u'Godth\xe5b', u'\u041c\u043e\u0441\u043a\u0432\u0430', u'\u6771\u4eac'])

So, in your second example where you write data retrieved from redis to a file, if you enable response decoding then you need to open the output file with the desired encoding. If this is the default encoding then you can just use open(). Otherwise you can use codecs.open() or manually encode the data before writing to the file.

import codecs

cities_tag = u'Hauptstädte'
with codecs.open('capitals.txt', 'w', encoding='utf8') as f:
    while r_server.scard(cities_tag) != 0:
        city = r_server.srandmember(cities_tag)
        f.write(city + '\n')
        r_server.srem(cities_tag, city)
mhawke
  • 84,695
  • 9
  • 117
  • 138