1

I am trying to create a table from a CSV using the schema autodetect option. It fails because some rows / columns have values that do not conform to the auto detected type. I would like to change the type for those columns to STRING.

Is there a way to export the autodetected schema so I can update it and use it to load. The CSV has 30+ columns and I would like to avoid having to manually generate a schema file for all the columns.

Update This question is not a duplicate of this. The latter is a solution to the case where the table already exists. In this question there is no existing table whose schema can be exported.

Rohit Namjoshi
  • 669
  • 5
  • 17
  • Possible duplicate of [Is there a way to export a BigQuery table's schema as JSON?](https://stackoverflow.com/questions/43195143/is-there-a-way-to-export-a-bigquery-tables-schema-as-json) – Harald Gliebe Mar 28 '19 at 19:07
  • 1
    @Harald That answer works for existing tables. In my situation, the table creation fails because of the schema mismatch. There is no source table. – Rohit Namjoshi Mar 28 '19 at 19:13
  • 1
    Maybe a silly way... If you leave only couple of lines of your csv, could it pass autodetect? If so, export schema then edit. If not, since it is only couple of lines, manual edit the non-conform column into "text" shouldn't be too hard. – Yun Zhang Mar 28 '19 at 20:51
  • @YunZhang Thanks for the suggestion. It works! I have dozens of CSV files to load so I wrote a script to extract 200 rows from a file load it, if load fails try a different sample of 200 rows until it succeeds. Then extract the schema and manually edit till the entire load succeeds. Since you work on BQ, perhaps you could create a feature request to allow logging of the autodetected schema in case of failure. – Rohit Namjoshi Mar 28 '19 at 23:20
  • You can suggest feature here: https://cloud.google.com/support/docs/issue-trackers. It is more convincing for a feature to be suggested by user :) – Yun Zhang Mar 29 '19 at 01:11
  • @YunZhang [Done](https://issuetracker.google.com/issues/129530312). 多谢 – Rohit Namjoshi Mar 29 '19 at 22:04
  • Have you tried these 2 options? --ignore_unknown_values --max_bad_records? https://cloud.google.com/bigquery/docs/reference/bq-cli-reference#bq_load. Would they allow you to create a table then dump schema? – Yun Zhang Mar 30 '19 at 00:15

0 Answers0