Python: Can dumpdata cannot loaddata back. UnicodeDecodeError

Question

I have been using Python 2.7, Django 1.5 and PostgreSQL 9.2 for two weeks. Never saw it before. Everything is freshly installed on my Windows 7 machine, so it should have default settings. Django beautifully generates tables in my db. Looks like everything works fine. I am able to dump data from my database by running:

manage.py dumpdata > test.json

or

manage.py dumpdata  --indent4 > test.json

I saw that the JSON file it looks as it should.

Then, I truncate some tables and try to load them from the JSON file with:

python manage.py loaddata database = T2  test.json    // or without db name

I got the following error:

“UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte”

If I open the test.json file in notepad, save it as utf8 and try again, then I get:

“No JSON object could be decoded”

The file still looks OK, not empty.

By the way, when I open the JSON file with notepad it offers me to save it as Unicode. My database has UTF8 encoding. Please advise. Thank you.

Do not use Notepad to modify the code – Paulo Bu Jul 24 '13 at 20:08 — Paulo Bu, Jul 24 '13 at 20:08
show `print(repr(open('test.json', 'rb').read(4)))` – jfs Jul 25 '13 at 16:18 — jfs, Jul 25 '13 at 16:18

Ducktown · Answer 1 · 2020-01-27T14:17:58.277

29

What worked for me is following these steps:

- Open the file in regular notepad
- Select save as
- Select encoding "UTF-8" (Not "UTF-8 (With BOM)")
- Save the file.

Now you can use loaddata.

However, this only works for files that are small enough for notepad to open.

edited Jan 27 '20 at 14:17

answered Jan 22 '20 at 10:05

Ducktown

391
3
5

1

achieved in notepad++ by setting utf-8 via Encoding -> UTF-8, then saving – andyw Aug 06 '21 at 05:50

score 7 · Answer 2 · edited May 23 '17 at 12:25

7

0xff in position 0 looks like the start of a little-endian UTF-16 byte order marker to me. Notepad's "Unicode" save mode is little-endian UTF-16, so that makes sense if you saved your json from Notepad after creating it. Notepad will keep the byte order marker even in utf-8, which could plausibly cause loaddata to fail to parse it.

If you don't have your un-edited json still handy, you'll need to remove the BOM - personally I'd use emacs, but another answer suggested this stand-alone Windows .exe:

http://www.bryntyounce.com/filebomdetector.htm

edited May 23 '17 at 12:25

Community

1
1

answered Jul 24 '13 at 20:06

Peter DeGlopper

36,326
7
90
83

Peter,Thank you for your reply. I cannot use emacs since I have Windows7. I did install utility you suggested and run it. Indeed it shows that all files but one doctored by Notepad are UTF-16. However after running the utility I still have the same “UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte” – Elena Kr Jul 25 '13 at 15:20
Step 1: convert to UTF-8. Step 2: Remove the BOM. – Peter DeGlopper Jul 25 '13 at 17:50
"I cannot use emacs since I have Windows7": Yes, you can. https://www.gnu.org/software/emacs/download.html – pst Aug 20 '16 at 12:16

score 4 · Answer 3 · answered Nov 18 '22 at 14:04

On windows, if you run your standard dumpdata command with -Xutf8 it has always solved this problem for me:

python -Xutf8 manage.py dumpdata app.mymodel > app/fixtures/mymodel.json

Here is an article for reference: https://dev.to/methane/python-use-utf-8-mode-on-windows-212i

score 3 · Answer 4 · answered Jan 29 '22 at 10:54

After good research, I got the solution. In my case, datadump.json file was having the issue.

Simply Open the file in notepad format
Click on save as option
Go to encoding section below & Click on "UTF-8"
Save the file.

Now you can try running the command. You are good to go :)

For your reference, I have attached images below.

Notepad

Save as

UTF-8

score 2 · Answer 5 · answered Jan 14 '20 at 02:22

i encountered the same problem when loading data. it has a problem with encodings. install notepad ++. and change the encoding format to UTF-8

in the lower right corner you can see the current encoding. if it is not UTF- 8, you can simply change it to UTF-8 form the encoding menu tab.

this solution worked for me.

orginal post

score 1 · Answer 6 · edited Jul 13 '19 at 03:50

I found one way to solve this issue by manually re-output a new binary json file with following code, rb stand for "read and binary", wb for "write and binary".

First, go to shell:

python manage.py shell

Second, rewrite the test.json to a binary file:

with open('path/to/test.json', 'rb') as f:
    data = f.read()
newdata = open('newfile.json', 'wb')
newdata.write(data)
newdata.close()
exit()

Then you can load the file:

python manage.py loaddata newfile.json

Above code works for me. Hope it can help you as well.

score 1 · Answer 7 · answered Jan 30 '20 at 18:45

1

If you are using newer versions of windows 10 you can use notepad to change the encoding from UTF-16 to UTF-8 simply by saving the file again and selecting the encoding option on the save dialog. See the example image below.

answered Jan 30 '20 at 18:45

Caleb Kandoro

19
1

1

Please can you link to the image – alias51 Aug 06 '20 at 15:34
Wondering why the Django manage.py dumpdata saves it in UTF-16 to begin with, anyone knows? – John Mc Jul 07 '22 at 04:10

Python: Can dumpdata cannot loaddata back. UnicodeDecodeError

7 Answers7

Linked