29

Is there a easy way to dump UTF-8 data from a database?

I know this command:

manage.py dumpdata > mydata.json

But the data I got in the file mydata.json, Unicode data looks like:

"name": "\u4e1c\u6cf0\u9999\u6e2f\u4e94\u91d1\u6709\u9650\u516c\u53f8"

I would like to see a real Unicode string like 全球卫星定位系统 (Chinese).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
icn
  • 17,126
  • 39
  • 105
  • 141

13 Answers13

18

After struggling with similar issues, I've just found, that xml formatter handles UTF8 properly.

manage.py dumpdata --format=xml > output.xml

I had to transfer data from Django 0.96 to Django 1.3. After numerous tries with dump/load data, I've finally succeeded using xml. No side effects for now.

Hope this will help someone, as I've landed at this thread when looking for a solution..

Tisho
  • 8,320
  • 6
  • 44
  • 52
  • 1
    Same error with xml `django.db.utils.OperationalError: Problem installing fixture '/app/tours/fixtures/tours.xml': Could not load tours.Tour(pk=06541d20-a873-11e9-b91d-5b320e2b2922): (1366, "Incorrect string value: '\\xCC\\x88kull...' for column 'description' at row 1") ` – Kiran Reddy Oct 07 '19 at 13:09
  • Yeah this totally didn't work on mine. I'm missing the `é` character. – Zack Plauché Feb 05 '22 at 13:17
13

This solution worked for me from @Julian Polard's post.

Basically just add -Xutf8 in front of py or python when running this command:

python -Xutf8 manage.py dumpdata > data.json

Please upvote his answer as well if this worked for you ^_^

Zack Plauché
  • 3,307
  • 4
  • 18
  • 34
12

django-admin.py dumpdata yourapp could dump for that purpose.

Or if you use MySQL, you could use the mysqldump command to dump the whole database.

And this thread has many ways to dump data, including manual methods.

UPDATE: because OP edited the question.

To convert from JSON encoding string to human readable string you could use this:

open("mydata-new.json","wb").write(open("mydata.json").read().decode("unicode_escape").encode("utf8"))
Community
  • 1
  • 1
YOU
  • 120,166
  • 34
  • 186
  • 219
  • thanks, i know this command, but the data i got in the file mydata.json , unicode data looks like "name": "\u4e1c\u6cf0\u9999\u6e2f\u4e94\u91d1\u6709\u9650\u516c\u53f8" I would like to see real unicode string like '全球卫星定位系统'(Chinese) – icn Jan 26 '10 at 04:30
  • Added some codes to convert that. I am not sure built-in dumpdata function can do it or not. – YOU Jan 26 '10 at 04:38
  • 2
    AttributeError: 'str' object has no attribute 'decode' – Kiran Reddy Oct 07 '19 at 13:32
6

You need to either find the call to json.dump*() in the Django code and pass the additional option ensure_ascii=False and then encode the result after, or you need to use json.load*() to load the JSON and then dump it with that option.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
5

Here I wrote a snippet for that. Works for me!

dir01
  • 2,120
  • 4
  • 18
  • 17
4

You can create your own serializer which passes ensure_ascii=False argument to json.dumps function:

# serfializers/json_no_uescape.py
from django.core.serializers.json import *


class Serializer(Serializer):

    def _init_options(self):
        super(Serializer, self)._init_options()
        self.json_kwargs['ensure_ascii'] = False

Then register new serializer (for example in your app __init__.py file):

from django.core.serializers import register_serializer

register_serializer('json-no-uescape', 'serializers.json_no_uescape')

Then you can run:

manage.py dumpdata --format=json-no-uescape > output.json

Victor Akimov
  • 531
  • 4
  • 9
2

As YOU has provided a good answer that is accepted, it should be considered that python 3 distincts text and binary data, so both files must be opened in binary mode:

open("mydata-new.json","wb").write(open("mydata.json", "rb").read().decode("unicode_escape").encode("utf8"))

Otherwise, the error AttributeError: 'str' object has no attribute 'decode' will be raised.

Ali Shamakhi
  • 63
  • 1
  • 8
1

I'm usually add next strings in my Makefile:

.PONY: dump

# make APP=core MODEL=Schema dump
dump:
    @python manage.py dumpdata --indent=2 --natural-foreign --natural-primary ${APP}.${MODEL} | \
    python -c "import sys; sys.stdout.write(sys.stdin.read().encode().decode('unicode_escape'))" \
    > ${APP}/fixtures/${MODEL}.json

It's ok for standard django project structure, fix if your project structure is different.

Denis Eliseev
  • 491
  • 4
  • 8
1

This problem has been fixed for both JSON and YAML in Django 3.1.

highpost
  • 1,263
  • 2
  • 14
  • 25
1

here's a new solution.

I just shared a repo on github: django-dump-load-utf8.

However, I think this is a bug of django, and hope someone can merge my project to django.

A not bad solution, but I think fix the bug in django would be better.

manage.py dumpdatautf8 --output data.json
manage.py loaddatautf8 data.json
wolfpan
  • 11
  • 1
0
import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, 'r').read().decode('string-escape')
codecs.open(dst, "wb").write(source)
darthwade
  • 1,434
  • 1
  • 10
  • 5
0

I encountered the same issue. After reading all the answers, I came up with a mix of Ali and darthwade's answers:

manage.py dumpdata app.category --indent=2 > categories.json
manage.py shell

import codecs
src = "/categories.json"
dst = "/categories-new.json"
source = codecs.open(src, "rb").read().decode('unicode-escape')
codecs.open(dst, "wb","utf-8").write(source)

In Python 3, I had to open the file in binary mode and decode as unicode-escape. Also I added utf-8 when I open in write (binary) mode.

I hope it helps :)

0

Here is the solution from djangoproject.com
You go to Settings there's a "Use Unicode UTF-8 for worldwide language support", box in "Language" - "Administrative Language Settings" - "Change system locale" - "Region Settings". If we apply that, and reboot, then we get a sensible, modern, default encoding from Python. djangoproject.com

Wertartem
  • 237
  • 2
  • 5