0

Daily, I ran an ELT process every 20 minutes (Talend), it takes some API REST response files (JSON), and some other JSON fields in a PostgreSQL server.

I've been processing with Talend, using JPATH to extract and "flat" the rows, so I get a CSV that I can freely load into BQ.

I don't feel this is the best way to do it since I iterate the pipes like 2000 times to make sure all the values will make a row in the end file, and then trim the NULL ones to just to load the ones with the proper value. This is dirty and slow.

I've always thought to switch to a cloud-based ETL (Stich, Fivetran, Xplenty, Alooma) but I don't know if those tools will be able to do this or load the JSON into BQ.

I've also explored the option of loading the JSON as a string in GQ and then use SCALAR to get the info I need in pure KPI rows, but I don't know if it will work.

The JSON format I need to load it's not a plan and some key/value, it's like this:

https://i.stack.imgur.com/3dgPs.png

(Zoom on "Properties")

https://i.stack.imgur.com/3dgPs.png

(It's test data, so no real information is here, but still, some of the test emails and names have been censored)

Side note, I've tried to contact Alooma guys, email, LinkedIn, even some google employees friends of mine, it's impossible, just want to try their product or even pay for it, they are like gone when Google bought them, did they die?

(i anyone interested this is a sample of 10 hits of the array: https://www.codepile.net/pile/q2ypDOxQ )

  • Did you try `jq`? https://stackoverflow.com/a/54658959/132438 – Felipe Hoffa Apr 29 '20 at 03:13
  • That won't be outside the ELT/ETL software, maybe if nobody knows a better answer, but I would love to just find a cloud ETL that can do this. The main point is getting rid of the compute engine machine in the ETL, if I have to use an external cmd I will need a place to run it, therefore why switch from Talend... – Alex Montejo Apr 29 '20 at 13:38
  • For complicated JSON, I would import it whole as a STRING to BigQuery, and then parse inside BigQuery – Felipe Hoffa Apr 29 '20 at 22:12
  • How exactly can I parse an Array of Json objects in bq? – Alex Montejo Apr 30 '20 at 01:09
  • We could use Stack Overflow for that https://stackoverflow.com/questions/45315359/how-to-convert-an-array-extracted-from-a-json-string-field-to-a-bigquery-repeate – Felipe Hoffa Apr 30 '20 at 03:07
  • I don think you are correct in the case because the JSON to be loaded/processes are so different in the two scenarios, I don't see BQ being able to use SCALAR on an array with index numbers inside the JSON.... anyway, I asked here to gather all the information here regarding the solution. – Alex Montejo Apr 30 '20 at 09:40
  • Note that inside BQ you can use Javascript to parse any kind of JSON with total freedom – Felipe Hoffa Apr 30 '20 at 21:46
  • Still is something outside ELT, I am processing in the ELT layer, why to move to that solution If I am already doing it, the answer is the core of this question, I need to do it in a cloud elt. – Alex Montejo May 01 '20 at 17:23

0 Answers0