We are loading data into a fact table, we the original temporary table on Snowflake looks like the following:
Where indicator_nbr
fields are questions asked within a survey.
We are using data modelling techniques in building our warehouse database, so the data will be added into a fact table like so:
Then the same for the indicator 2 and 3 and so on if there is other questions.
Each Field with its value will be as a single row. Of course there is other metadata to be added like load_dt
and record_src
but they are not a problem.
The current script is doing the following:
Get the fields into an array =>
fields_array = ['indicator_1', 'indicator_2', 'indicator_3']
A loop will run over the array and start adding each field with its value for each row. So imagine we are having 100 rows, we will run 300 inserts, one at a time:
for (var col_num = 0; col_num<fields_array.length; col_num = col_num+1) {
var COL_NAME = fields_array[col_num]; var field_value_query = "INSERT INTO SAT_FIELD_VALUE SELECT md5(id), CURRENT_TIMESTAMP(), NULL, 'SRC', "+COL_NAME+", md5(foreign_key_field) FROM "+TEMP_TABLE_NAME+"";
}
As mentioned in the comment on this post showing the full script, it is better to loop over a string concatenating each from values
of the insert query.
There is 2 issues of the suggested solution:
- There is a size limit of a query on Snowflake (it should be less than 1 MB);
- if we are going to loop over each field and concatenate the
from values
, we should do aselect
query as well from the temp table to get the value of the column, so there will be no optimization, or we will reduce the time a little bit but not to much.
EDIT: A possible solution
I was thinking of doing an sql query selecting everything from the temp table, and do hashing and everything and save it into an array after transposing, but I have no idea how to do it.