I have a table in bigquery where I have object and for each object I have some stringified json. In json, an example row look like below:
{
"ObjectID": "1984931229",
"indexed_abstract": "{\"IndexLength\":123,\"InvertedIndex\":{\"Twenty-seven\":[0],\"metastatic\":[1,45],\"breast\":[2],\"adenocarcinoma\":[3],\"patients,\":[4]}}"
}
where inside the indexed_abstract
we have an InvertedIndex
which contains some keywords and how many times these keywords appeared in the ObjectID
.
Now I want to access the stringified json by parsing the json using bigquery and for each ObjectID
I want to create a nested field where I have the keyword, the corresponding array and the length of the corresponding array.
For example in this case the output would look like the following:
+------------+----------------+---------------+-------------------+
| ObjectID | keyword.key | keyword.count | keyword.positions |
+------------+----------------+---------------+-------------------+
| 1984931229 | Twenty-seven | 1 | [0] |
| | metastatic | 2 | [1,45] |
| | breast | 1 | [2] |
| | adenocarcinoma | 1 | [3] |
| | patients | 1 | [4] |
+------------+----------------+---------------+-------------------+
I understand I could use JSON_EXTRACT function but I am not sure what would be my key inside the inverted index to access the keywords and the arrays corresponding to them.