1

I'm pretty new to python. I have about 21 JSON files to send with a POST request. Unfortunately the service a which I'm trying to send these files (Qualtrics) only accepts file size up to 5MB. With that said, I need to split these JSON files every 4.5 MB (just to be sure). This python script is part of a data stream, so if it failes, the next steps will not be executed.

So, what I have now and what my script do is:

  • From CSV convert to JSON
    • JSON format is [{"key1":"value","key2":"value",...},{...}]
  • Send a POST request with that JSON.
    • The script failes if file size is > 5MB.

I already tried to search for a solution but I wasn't lucky. I have been able to split CSV but since a 2MB CSV turns into a 5.5MB JSON I cannot be completely sure.

Do you guys have any suggestion? Another thing that blocks me is that since i'm working on a linux server that is not mine, I haven't been grantend permission to install additional libraries and even if I ask for it I get a NO as a response...

Thank you very much.

Roxoradev
  • 863
  • 4
  • 13
  • should be able to use something like this to check the file size but you would have to figure out a way to create temp files while you check the size. https://stackoverflow.com/questions/2104080/how-to-check-file-size-in-python – janDro Dec 19 '18 at 14:06
  • Is each line in the file a correctly formatted JSON string? you might be better off just truncating each line of the file as you are reading it and send each POST individually. https://www.tutorialspoint.com/python/file_truncate.htm – janDro Dec 19 '18 at 14:09
  • @janDro It is a valid JSON but "minified". Everything in the first line. – Roxoradev Dec 19 '18 at 14:12
  • Ok so each file contains exactly one minified JSON string? Does Qualtrics provide other API endpoints where you could split up the JSON into smaller domains? If we split the JSON just based on size then you run the risk of sending invalid JSON to the API right? – janDro Dec 19 '18 at 14:20

1 Answers1

0

Assuming that your JSON is an array of objects like this:

[
   {"key1: "value1", "key2: "value2", ....},
   ...
   {"key1: "value1", "key2: "value2", ....}
]

Then you could build your payload manually like this:

def send_entries(entries):
   payload = '[' + ','.join(entries) + ']'
   send_entries(payload)

json_entries = []
total_size = 0

for line in csv:
    json_entry = converto_to_json(line)
    json_entries.append(json_entry)
    total_size += len(json_entry)
    if total_size >= 4_500_000:
        send_entries(json_entries)
        total_size = 0
        json_entries = []

if json_entries:
    send_entries(json_entries)

The actual size of the payload might be a bit larger than the 4.5Mb, but as long as each entry in your CSV is less than 500Kb you should be fine.

Cesar Canassa
  • 18,659
  • 11
  • 66
  • 69