Large Json file send batches wise to HubSpot API

Question

I tried many ways and tested many scenarios I did R&D a lot but unable to found issue/solution

I have a requirement, The HubSpot API accepts only 15k rec every time so we have large json file so we need to split/divide like batches wise 15k rec need to send api once 15k added in api it sleeps 10 sec and capture each response like this, the process would continue until all rec finished

I try with chunk code and modulus operator but didn't get any response

Not sure below code work or not can anyone please suggest better way

How to send batches wise to HubSpot API, How to post

Thanks in advance, this would great help for me!!!!!!!!

with open(r'D:\Users\lakshmi.vijaya\Desktop\Invalidemail\allhubusers_data.json', 'r') as run:
                    dict_run = run.readlines()
                    dict_ready = (''.join(dict_run))
                    count = 1000
                    subsets = (dict_ready[x:x + count] for x in range(0, len(dict_ready), count))
                    url = 'https://api.hubapi.com/contacts/v1/contact/batch'
                    headers = {'Authorization' : "Bearer pat-na1-**************************", 'Accept' : 'application/json', 'Content-Type' : 'application/json','Transfer-encoding':'chunked'}
                    for subset in subsets:
                       #print(subset)
                       urllib3.disable_warnings()
                       r = requests.post(url, data=subset, headers=headers,verify=False, 
                        timeout=(15,20), stream=True)     
                       print(r.status_code)
                       print(r.content)

ERROR:;; 400 b'\r\n400 Bad Request\r\n\r\n
400 Bad Request
\r\n
cloudflare\r\n\r\n\r\n'

This is other method:

with open(r'D:\Users\lakshmi.vijaya\Desktop\Invalidemail\allhubusers_data.json', 'r') as run:
                    dict_run = run.readlines()
                    dict_ready = (''.join(dict_run))
                    url = 'https://api.hubapi.com/contacts/v1/contact/batch'
                    headers = {'Authorization' : "Bearer pat-na1***********-", 'Accept' : 'application/json', 'Content-Type' : 'application/json','Transfer-encoding':'chunked'}

                    urllib3.disable_warnings()
                    r = requests.post(url, data=dict_ready, headers=headers,verify=False, 
                     timeout=(15,20), stream=True) 
                    r.iter_content(chunk_size=1000000)    
                    print(r.status_code)
                    print(r.content)

ERROR:::: raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='api.hubapi.com', port=443): Max retries exceeded with url: /contacts/v1/contact/batch (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2396)')))

This how json data looks like in large json file

{
    "email": "aaazaj21@yahoo.com",
    "properties": [
        {
            "property": "XlinkUserID",
            "value": 422211111
        },
        {
            "property": "register_time",
            "value": "2021-09-02"
        },
        {
            "property": "linked_alexa",
            "value": 1
        },
        {
            "property": "linked_googlehome",
            "value": 0
        },
        {
            "property": "fan_speed_switch_0x51_",
            "value": 2
        }
    ]
},
{
    "email": "zzz7@gmail.com",
    "properties": [
        {
            "property": "XlinkUserID",
            "value": 13333666
        },
        {
            "property": "register_time",
            "value": "2021-04-24"
        },
        {
            "property": "linked_alexa",
            "value": 1
        },
        {
            "property": "linked_googlehome",
            "value": 0
        },
        {
            "property": "full_colora19_st_0x06_",
            "value": 2
        }
    ]
}

I try with adding list of objects

[
{
    "email": "aaazaj21@yahoo.com",
    "properties": [
        {
            "property": "XlinkUserID",
            "value": 422211111
        },
        {
            "property": "register_time",
            "value": "2021-09-02"
        },
        {
            "property": "linked_alexa",
            "value": 1
        },
        {
            "property": "linked_googlehome",
            "value": 0
        },
        {
            "property": "fan_speed_switch_0x51_",
            "value": 2
        }
    ]
},
{
    "email": "zzz7@gmail.com",
    "properties": [
        {
            "property": "XlinkUserID",
            "value": 13333666
        },
        {
            "property": "register_time",
            "value": "2021-04-24"
        },
        {
            "property": "linked_alexa",
            "value": 1
        },
        {
            "property": "linked_googlehome",
            "value": 0
        },
        {
            "property": "full_colora19_st_0x06_",
            "value": 2
        }
    ]
}
]

Dan-Dev · Answer 1 · 2022-11-30T12:52:00.660

You haven't said if your JSON file is a representation of an array of objects or just one object. Arrays are converted to Python lists by json.load and objects are converted to Python dictionaries.

Here is some code that assumes it is an array of objects if is is not an array of objects see https://stackoverflow.com/a/22878842/839338 but the same principle can be used

Assuming you want 15k bytes not records if it is the number of records you can simplify the code and just pass 15000 as the second argument to chunk_list().

import json
import math
import pprint


# See https://stackoverflow.com/a/312464/839338
def chunk_list(list_to_chunk, number_of_list_items):
    """Yield successive chunk_size-sized chunks from list."""
    for i in range(0, len(list_to_chunk), number_of_list_items):
        yield list_to_chunk[i:i + number_of_list_items]


with open('./allhubusers_data.json', 'r') as run:
    json_data = json.load(run)
    desired_size = 15000
    json_size = len(json.dumps(json_data))
    print(f'{json_size=}')
    print(f'Divide into {math.ceil(json_size/desired_size)} sub-sets')
    print(f'Number of list items per subset = {len(json_data)//math.ceil(json_size/desired_size)}')
    if isinstance(json_data, list):
        print("Found a list")
        sub_sets = chunk_list(json_data, len(json_data)//math.ceil(json_size/desired_size))
    else:
        exit("Data not list")
    for sub_set in sub_sets:
        pprint.pprint(sub_set)
        print(f'Length of sub-set {len(json.dumps(sub_set))}')
        # Do stuff with the sub sets...
        text_subset = json.dumps(sub_set)  # ...

you may need to adjust the value of desired_size downwards if the sub_sets vary in length of text.

UPDATED IN RESPONSE TO COMMENT If you just need 15000 records per request this code should work for you

import json
import pprint
import requests


# See https://stackoverflow.com/a/312464/839338
def chunk_list(list_to_chunk, number_of_list_items):
    """Yield successive chunk_size-sized chunks from list."""
    for i in range(0, len(list_to_chunk), number_of_list_items):
        yield list_to_chunk[i:i + number_of_list_items]


url = 'https://api.hubapi.com/contacts/v1/contact/batch'
headers = {
    'Authorization': "Bearer pat-na1-**************************",
    'Accept': 'application/json',
    'Content-Type': 'application/json',
    'Transfer-encoding': 'chunked'
}

with open(r'D:\Users\lakshmi.vijaya\Desktop\Invalidemail\allhubusers_data.json', 'r') as run:
    json_data = json.load(run)
    desired_size = 15000
    if isinstance(json_data, list):
        print("Found a list")
        sub_sets = chunk_list(json_data, desired_size)
    else:
        exit("Data not list")
    for sub_set in sub_sets:
        # pprint.pprint(sub_set)
        print(f'Length of sub-set {len(sub_set)}')
        r = requests.post(
            url,
            data=json.dumps(sub_set),
            headers=headers,
            verify=False,
            timeout=(15, 20),
            stream=True
        )
        print(r.status_code)
        print(r.content)

Thanks for your answer, my json file have array of objects and json.load(run) not working i did this dict_run = run.readlines(), json_data= (''.join(dict_run)), i'm getting this ouput json_size=212531931 Divide into 14169 sub-sets Number of list items per subset = 13618 Data not lis — lAkShMipythonlearner, Nov 30 '22 at 06:35
i posted sample json format in above, can you please check once, i just want to send json data batches wise to HUBSPOT API and capture each response this process will continue till all the objects finished — lAkShMipythonlearner, Nov 30 '22 at 06:39
Just use the code `json_data = json.load(run)` to load the file don't convert it to text with `run.readlines()` also do you want 15000 records per sub-set or 15000 bytes per sub-set? — Dan-Dev, Nov 30 '22 at 09:14
Thanks for your response, Yah i need 15k records everytime, that 15k rec post to Hubspot api, i tried to post to API warnings.filterwarnings('ignore', message='Unverified HTTPS request') r = requests.post(url, data=text_subset, headers=headers,verify=False, timeout=(15,20), stream=True) print(r.status_code) print(r.content) result: 202 b'', the status code not getting 200 and not sure it is posting everytime 15k — lAkShMipythonlearner, Nov 30 '22 at 11:54
I have updated the answer to use 15K records. You are getting a 202 which is good as the request has been accepted see https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/202. You should be able to tell from the output the length of the sub-set. — Dan-Dev, Nov 30 '22 at 12:12
I just tried above updated code now i'm getting this Length of sub-set 4670599 Length of sub-set 4671779 Length of sub-set 4662799 Length of sub-set 4676840 raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='api.hubapi.com', port=443): Max retries exceeded with url: /contacts/v1/contact/batch (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2396)'))) — lAkShMipythonlearner, Nov 30 '22 at 12:44
It was still printing the size in bytes I have updated it now it should be `print(f'Length of sub-set {len(sub_set)}')` are you using a HTTP proxy? — Dan-Dev, Nov 30 '22 at 12:52
I just remove this 'Transfer-encoding': 'chunked' in headers , now i'm getting Length of sub-set 15000 202 b'' Length of sub-set 15000 202 b'' Why i'm not getting 200 response as status code and (r.content) getting b'' and how to add Time.sleep(10) conditions for every posting — lAkShMipythonlearner, Nov 30 '22 at 13:14
202 is a successful response code like 200. The server is replying that it is okay and the data has been received, there is probably no content b'' indicates bytes with no data see https://stackoverflow.com/questions/6269765/what-does-the-b-character-do-in-front-of-a-string-literal . just add the sleep before each call to requests.post() — Dan-Dev, Nov 30 '22 at 13:31
it means the code is sending every time 15K records to api right!!, Thanks a lot!! I struggle from 1 week to resolve the issue, Have Good day!! — lAkShMipythonlearner, Nov 30 '22 at 13:46

Large Json file send batches wise to HubSpot API

400 Bad Request

1 Answers1