I have an application where I have to store people's names and make them searchable. The technologies I am using are python (v2.7.6) django (v1.9.5) rest framwork. The dbms is postgresql (v9.2). Since the user names can be arabic we are using utf-8 as db encoding. For search we are using haystack (v2.4.1) with Amazon Elastic Search for indexing. The index was building fine a few days ago but now when I try to rebuild it with
python manage.py rebuild_index
it fails with the following error
'ascii' codec can't decode byte 0xc3 in position 149: ordinal not in range(128)
The full error trace is
File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 188, in handle_label
self.update_backend(label, using)
File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 233, in update_backend
do_update(backend, index, qs, start, end, total, verbosity=self.verbosity, commit=self.commit)
File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 96, in do_update
backend.update(index, current_qs, commit=commit)
File "/usr/local/lib/python2.7/dist-packages/haystack/backends/elasticsearch_backend.py", line 193, in update
bulk(self.conn, prepped_docs, index=self.index_name, doc_type='modelresult')
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 188, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 160, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 85, in _process_bulk_chunk
resp = client.bulk('\n'.join(bulk_actions) + '\n', **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 69, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 795, in bulk
doc_type, '_bulk'), params=params, body=self._bulk_body(body))
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 329, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_requests.py", line 68, in perform_request
response = self.session.request(method, url, data=body, timeout=timeout or self.timeout)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 455, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 558, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 330, in send
timeout=timeout
File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 558, in urlopen
body=body, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 353, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python2.7/httplib.py", line 979, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1013, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 833, in _send_output
msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 149: ordinal not in range(128)
My guess is that befor we didn't have arabic characters in our database so the index was building fine but now since users have entered arabic chars the index fails to build.