Problem
I am looking at a some 2-5 million events of a time series, batch-queried daily from a REST API to feed into rows of a PostgreSQL database. I use a RDMS because I want to have the time series pre-structured for faster analysis and exploration later on and he input schema does not change, really. So Cassandra or MongoDB are not an option although they would happily accept JSON as is.
The database is running on a cloud infrastructure. PostgreSQL is accessible through the network. Maybe the database itself residing on a network-attached file system, so I/O is suffering from the respective latency and limited bandwidth.
Also, if running the crawler/importer as a serverless function, any solution on top of the JVM or python has a significant overhead.
Objective
As the task itself is rather simple and linear, I was looking for a very small footprint solution - at best without the dependency of any runtime or interpreter in the first place.
So building on a small bash script cascading httpie
/ curl
, jq
and the native psql
client like so sound promising:
#!/bin/bash
DATE=`date +%Y-%m-%d`
http POST https://example.net/my/api/v2 date=$DATE | \
jq -r '<jq json filter>' | \
psql -U myuser -d mydb`
Question
To do just that, how to convert a multi-level JSON including arrays to a prepared SQL STATEMENT using just jq
?
Expected SQL statement:
INSERT INTO events (date,node,sensor,prop1,prop2,prop3,prop4) VALUES
(2020-09-10,4170,1,0,-1,0,0),
(2020-09-10,4170,1,0,-1,300,0),
....
(2020-09-10,8888,2,0,-1,0,0)
JSON input:
{
"date": "2020-09-10",
"events": [
{
"sensor": 1,
"intervals": [
{
"prop1": 0,
"prop2": -1,
"prop3": 0,
"prop4": 0
},
{
"prop1": 0,
"prop2": -1,
"prop3": 300,
"prop4": 0
}],
"node": 4170
},
{
"sensor": 2,
"intervals": [
{
"prop1": 0,
"prop2": -1,
"prop3": 0,
"prop4": 0
}],
"node": 8888
}]
}