3

I have several large json objects (think GB scale), where the object values in some of the innermost levels are arrays of objects. I'm using jq 1.4 and I'm trying to break these arrays into individual objects, each of which will have a key such as g__0 or g__1, where the numbers correspond to the index in the original array, as returned by the keys function. The number of objects in each array may be arbitrarily large (in my example it is equal to 3). At the same time I want to keep the remaining structure.

For what it's worth the original structure comes from MongoDB, but I am unable to change it at this level. I will then use this json file to create a schema for BigQuery, where an example column will be seeds.g__1.guid and so on.

What I have:

{
 "port": 4500,
 "notes": "This is an example",
 "seeds": [
  {
    "seed": 12,
    "guid": "eaf612"
  },
  {
    "seed": 23,
    "guid": "bea143"
  },
  {
    "seed": 38,
    "guid": "efk311"
  }
  ]
}

What I am hoping to achieve:

{
  "port": 4500,
  "notes": "This is an example",
  "seeds": {
    "g__0": {
      "seed": 12,
      "guid": "eaf612"
    },
    "g__1": {
      "seed": 23,
      "guid": "bea143"
    },
    "g__2": {
      "seed": 38,
      "guid": "efk311"
    }
  }
}

Thanks!

adaris
  • 315
  • 3
  • 11

2 Answers2

2

The following jq program should do the trick. At least it produces the desired results for the given JSON. The program is so short and straightforward that I'll let it speak for itself:

def array2object(prefix):
  . as $in
  | reduce range(0;length) as $i ({}; .["\(prefix)_\($i)"] = $in[$i]);

.seeds |= array2object("g__")
peak
  • 105,803
  • 17
  • 152
  • 177
  • Thanks peak, I tried this against a couple of different scenarios and it always worked well. – adaris Mar 15 '17 at 19:04
1

So, you essentially want to transpose (pivot) your data in BigQuery Table such that instead of having data in rows as below

enter image description here

you will have your data in columns as below

enter image description here

Thus, my recommendation would be
First, load your data as is to start with
So now, instead of doing schema transformation outside of BigQuery, let’s rather do it within BigQuery!

Below would be an example of how to achieve transformation you are looking for (assuming you have max three items/objects in array)

#standardSQL
SELECT 
  port, notes, 
  STRUCT(
    seeds[SAFE_OFFSET(0)] AS g__0, 
    seeds[SAFE_OFFSET(1)] AS g__1, 
    seeds[SAFE_OFFSET(2)] AS g__2
  ) AS seeds 
FROM yourTable   

You can test this with dummy data using CTE like below

#standardSQL
WITH yourTable AS (
  SELECT 
    4500 AS port, 'This is an example' AS notes,
    [STRUCT<seed INT64, guid STRING>
        (12, 'eaf612'), (23, 'bea143'), (38, 'efk311')
    ] AS seeds
  UNION ALL SELECT 
    4501 AS port, 'This is an example 2' AS notes,
    [STRUCT<seed INT64, guid STRING>
        (42, 'eaf412'), (53, 'bea153')
    ] AS seeds
)
SELECT 
  port, notes, 
  STRUCT(
    seeds[SAFE_OFFSET(0)] AS g__0, 
    seeds[SAFE_OFFSET(1)] AS g__1, 
    seeds[SAFE_OFFSET(2)] AS g__2
  ) AS seeds 
FROM yourTable   

So, technically, if you know max number of items/object in seeds array – you can just manually write needed SQL statement, to run it against real data.
Hope you got an idea

Of course you can script /automate process – you can find examples for similar pivoting tasks here:

https://stackoverflow.com/a/40766540/5221944
https://stackoverflow.com/a/42287566/5221944

Community
  • 1
  • 1
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Many thanks for the reply Mikhail. I did not choose your answer because in my case it is difficult to know the max number of items/object in my array (and I also have an arbitrarily number of such arrays), so I find it preferable to manipulate the row data. I agree though that this is a good approach in the BigQuery level. – adaris Mar 20 '17 at 14:58