I am writing a python script that does these steps below.
Query a MongoDB database Parse and aggregate results Upload data to a ServiceNow table via a REST API
The script works, however, the data set is too large, and the REST transaction times out after 60 seconds (the connection is closed by the ServiceNow server at the destination).
I will need to segment the data into chunks and send separate REST transactions for each data chunk to ensure the full data set is sent via POST without hitting the timeout limit.
How can I achieve that goal by modifying the script below?
#!/usr/bin/env python
from config import *
import os, sys
mypath = os.path.dirname(os.path.realpath(__file__))
sys.path.append(os.path.join(mypath, "api-python-client"))
from apiclient.mongo import *
from pymongo import MongoClient
import json
import requests
from bson.json_util import dumps
client = MongoClient(mongo_uri)
#Create ServiceNow URL
svcnow_url = create_svcnow_url('u_imp_cmps')
#BITSDB Nmap Collection
db = client[mongo_db]
#Aggregate - RDBMS equivalent to Alias select x as y
#Rename fields to match ServiceNow field names
computers = db['computer'].aggregate([
{"$unwind": "$hostnames"},
{"$project" : {
"_id":0,
"u_hostname": "$hostnames.name",
"u_ipv4": "$addresses.ipv4",
"u_status": "$status.state",
"u_updated_timestamp": "$last_seen"
}}
])
j = dumps({"records":computers})
#print(j)
#Set proper headers
headers = {"Content-Type":"application/json","Accept":"application/json"}
#Build HTTP Request
response = requests.post(url=svcnow_url, auth=(svcnow_user, svcnow_pwd), headers=headers ,data=j)
#Check for HTTP codes other than 200
if response.status_code != 200:
print('Status:', response.status_code, 'Headers:', response.headers, 'Response Text', response.text, 'Error Response:',response.json())
exit()
#Decode the JSON response into a dictionary and use the data
print('Status:',response.status_code,'Headers:',response.headers,'Response:',response.json())
UPDATE: I have a plan but I am not sure how to implement this exactly.
- Set the cursor to a fixed batch size of 1000 records each
- When a batch is full, create the JSON output and send the data via requests
- In a loop: continue grabbing new batches and sending each batch to the destination until the entire data set is reached
https://docs.mongodb.com/v3.0/reference/method/cursor.batchSize/
Basically I think I can solve this by creating batches and looping through the batches, with a new API call each time. Please let me know if anyone out there has any ideas if this is a good plan and how to implement a solution. Thanks.