5

I wish to load initial data using fixtures as described here

https://docs.djangoproject.com/en/dev/howto/initial-data/

This would be easy enough with a small data set. However I wish to load a large CSV which will not fit into memory. How would I go about serializing this to a large JSON format? Do I have to hack it by manually writing the opening '[' and closing ']' or is there a cleaner of doing this?

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
Hoa
  • 19,858
  • 28
  • 78
  • 107
  • 1
    I've done some searching and it seems to be little content on the web about that topic. You can write custom code to solve this problem either using direct database access or executing the script within the django context (which I recommend for integrity reasons). If you provide some more info maybe we could help with the script. – Paulo Bu May 16 '13 at 13:19
  • Couple of useful functions http://stackoverflow.com/questions/2400643/is-there-a-memory-efficient-and-fast-way-to-load-big-json-files-in-python http://stackoverflow.com/questions/10715628/opening-a-large-json-file-in-python – Rohan May 16 '13 at 13:46

2 Answers2

1

I realize this is quite old, but I just had the same issue.

Using this post as reference:

Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?

I split my original large array of json objects into individual objects, one per file, like so:

 jq -c '.[]' fixtures/counts_20210517.json | \
 awk '{print > "fixtures/counts_split/doc00" NR ".json";}'      

and then looped over the files, added a square bracket to the beginning and end, and called the manage.py loaddata on that file

for file in fixtures/counts_split/*json; do
    echo "loading ${file}" 
    sed -i '1s/^/[/' $file
    sed -i '1s/$/]/' $file
    manage.py loaddata $file
done

oguzkhan
  • 30
  • 1
  • 6
  • Any idea how to modify the `jq` command to group more than 1 individual objects per file? – Chris Mar 27 '23 at 08:04
0

Seeing that you are starting with a CSV file you could create a custom command. You can read the CSV file, create the objects and save them to the database within the command. As long as you can process each line of the CSV within the loop you will not run into memory issues.

Relative documentation can be found here:

http://docs.python.org/2/library/csv.html https://docs.djangoproject.com/en/dev/howto/custom-management-commands/

Simon Luijk
  • 553
  • 5
  • 6