3

I'm trying to unload data from Snowflake to GCS and then GCS to bq table. Here is the Code for unloading data from snowflakes.

```copy into @unload_stage/FF_TBL_UNLOAD20200906.csv.gz from 
(
select * 
from SF_DB.SF_TBLS.FF_TBL_UNLOAD
)
file_format = (
  type=csv compression='gzip' 
  FIELD_DELIMITER = '|' 
  field_optionally_enclosed_by='"' 
  NULL_IF=()
  EMPTY_FIELD_AS_NULL = TRUE 
)
single = false
max_file_size=5300000000 
header = false;```

And then I'm copying data from GCS to bq using the following script

```#!/bin/bash
date=20200906

echo "Removing FF_TBL_UNLOAD list with same date list...."

rm /home/varma/FF_TBL_UNLOADlist"$date".txt

echo "Listing FIlenames for FF_TBL_UNLOAD in GCS BUCKET...."

gsutil ls gs://syw_hs_inbound_outbound_data/FF_TBL_UNLOAD"$date"*.gz>>/home/varma/FF_TBL_UNLOADlist"$date".txt

echo "Starting Uploading  Data into table from FF_TBL_UNLOAD$date list..."
if [ -s /home/varma/FF_TBL_UNLOADlist"$date".txt ]
then
        while IFS= read -r line
        do
                echo "Uploading data for file $line"
        bq load --noreplace --field_delimiter="|" hty-ttw-analysis:out.FF_TBL_UNLOAD $line
        done < "/home/varma/FF_TBL_UNLOADlist${date}.txt"

else
        echo "File is Empty"

fi```

It worked for all the tables except this table, error I'm getting is Error while reading data, error message: CSV table references column position 174, but line starting at position:136868 contains only 94 columns.

Could anyone please help me with this error, Should I change file format or Should I make some changes in the script which I'm uploading to bq

  • Could you please check the how NULL IF field will be treated in Bigquery? It could be great if you share the sample data – Sriga Sep 07 '20 at 04:57
  • Hi - which of the two processes is failing: the unload from Snowflake or the load into BQ? Assuming it is the load into BQ, and that there are multiple files, does it error on the first/every file or just on a specific file? Can you open the file in an application such as Excel or Notepad++ and see what the issue is? – NickW Sep 07 '20 at 10:50
  • Hi @SaivarmaMantena, I have posted the answer as Community wiki. So If my answer addressed your question, please consider upvoting and accepting it. If not, let me know so that the answer can be improved. Accepting an answer will help the community members with their research as well. – Prajna Rai T Mar 07 '22 at 14:04

2 Answers2

2

As you have said in the comment, using below command solved the issue:

bq load --noreplace --field_delimiter="|" --null_marker=NULL --allow_quoted_newlines=TRUE --allow_jagged_rows=TRUE
Prajna Rai T
  • 1,666
  • 3
  • 15
  • I resolved this without the `allow_jagged_columns` option (which says it should be used for columns that are 'optional'. The `allow_quoted_newlines` flag is the relevant one for this error if it's caused by newlines in columns. – Louis Maddox Jul 11 '22 at 13:51
1

Based on the error message I would say that the issue is on the command line:

bq load --noreplace --field_delimiter="|" hty-ttw-analysis:out.FF_TBL_UNLOAD $line

According to this and also this other question this error message is displayed when there is an issue with the data being loaded, that BigQuery can't parse properly or map to the table you want to populate. So the solution seems to be that you add a validation/cleaning step for your CSV data before running this command, so it doesn't break your BigQuery load.

To know better what could be the issue with the data, a sample data that can reproduce the error message would help a lot, as other members of the community point out.

S. Tyr
  • 639
  • 4
  • 12
  • 3
    Thanks for the reply. It worked for me for this command 'bq load --noreplace --field_delimiter="|" --null_marker=NULL --allow_quoted_newlines=TRUE --allow_jagged_rows=TRUE' – Saivarma Manthena Sep 08 '20 at 13:43