3

I am new in bigquery, Here I am trying to load the Data in GCP BigQuery table which I have created manually, I have one bash file which contains bq load command -

bq load --source_format=CSV --field_delimiter=$(printf '\u0001') dataset_name.table_name gs://bucket-name/sample_file.csv

My CSV file contains multiple ROWS with 16 column - sample Row is

100563^3b9888^Buckname^https://www.settttt.ff/setlllll/buckkkkk-73d58581.html^Buckcherry^null^null^2019-12-14^23d74444^Reverb^Reading^Pennsylvania^United States^US^40.3356483^-75.9268747

Table schema -

BigQuery table schema

When I am executing bash script file from cloud shell, I am getting following Error -

Waiting on bqjob_r10e3855fc60c6e88_0000016f42380943_1 ... (0s) Current status: DONE   
BigQuery error in load operation: Error processing job 'project-name- 
staging:bqjob_r10e3855fc60c6e88_0000ug00004521': Error while reading data, error message: CSV 
table
encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection 
for more details.
Failure details:
- gs://bucket-name/sample_file.csv: Error while
reading data, error message: CSV table references column position
15, but line starting at position:0 contains only 1 columns.

What would be the solution, Thanks in advance

Bhagesh Arora
  • 547
  • 2
  • 12
  • 30
  • Can you provide an example of your source file and tell us if your 1st row contains the headers or not – Tamir Klein Dec 26 '19 at 13:31
  • @Tamir Klein No, There is no header in CSV file,Above I already shown 1st line of CSV file – Bhagesh Arora Dec 26 '19 at 13:34
  • Sorry, I see your example. I also see that the 2nd field is defined as an Integer in your schema but in your example, you are attempting to insert a String (3b9888). You can try an use --max_bad_records flag and avoid the default value which is zero(0). See this link for more information https://cloud.google.com/bigquery/docs/reference/bq-cli-reference#bq_load – Tamir Klein Dec 26 '19 at 13:40
  • I specified --max_bad_records=20 in my bash file. But still I am getting error - [gs://project-name-staging/sample_file.csv] Error while reading data, error message: CSV table references column position 15, but line starting at position:0 contains only 1 columns. [gs://project-name-staging/sample_file.csv] Error while reading data, error message: CSV table references column position 15, but line starting at position:195 contains only 1 columns. – Bhagesh Arora Dec 26 '19 at 13:52
  • Here I am trying to load only one Row for testing purpose. – Bhagesh Arora Dec 26 '19 at 13:54
  • probably you have empty rows – Tiago Medici Apr 21 '21 at 18:30

3 Answers3

2

You are trying to insert wrong values to your table per the schema you provided

Based on table schema and your data example I run this command:

./bq load --source_format=CSV  --field_delimiter=$(printf '^') mydataset.testLoad  /Users/tamirklein/data2.csv

1st error

Failure details: - Error while reading data, error message: Could not parse '39b888' as int for field Field2 (position 1) starting at location 0

At this point, I manually removed the b from 39b888 and now I get this

2nd error

Failure details: - Error while reading data, error message: Could not parse '14/12/2019' as date for field Field8 (position 7) starting at location 0

At this point, I changed 14/12/2019 to 2019-12-14 which is BQ date format and now everything is ok

Upload complete. Waiting on bqjob_r9cb3e4ef5ad596e_0000016f42abd4f6_1 ... (0s) Current status: DONE

You will need to clean your data before upload or use a data sample with more lines with --max_bad_records flag (Some of the lines will be ok and some not based on your data quality)

Note: unfortunately there is no way to control date format during the upload see this answer as a reference

Tamir Klein
  • 3,514
  • 1
  • 20
  • 38
2

We had the same problem while importing data from local to BigQuery. After researching the data we saw that there data which starting \r or \s enter image description here

After implementing ua['ColumnName'].str.strip() and ua['District'].str.rstrip(). we could add data to Bg.

Thanks

10 Rep
  • 2,217
  • 7
  • 19
  • 33
0

We had the same problem while importing data from local to BigQuery. After testing so many options, we found out that the reason of the error is the mismatch of the number of columns in the data with the number of columns in the schema. Hope this helps some

Aung Myint Thein
  • 390
  • 5
  • 17