0

I've got a Red Hat Linux server and hundreds of CSV files (most of the files are 100mb in size). I have installed Snowflake ODBC Driver on this server and I want to iteratively load these files into respective Snowflake tables. I am looking for a solution on this. Can anybody help please? Thanks.

I am not able to find solution to load the CSV files. Some of the solutions I found on internet just load 1 CSV file by running insert statement and providing field names in insert clause. But in my case there are multiple CSV files and these files have different number of columns. Hence that solution is not feasible.

  • Does this answer your question? [How do I read and write CSV files with Python?](https://stackoverflow.com/questions/41585078/how-do-i-read-and-write-csv-files-with-python) – user_na Mar 18 '23 at 20:08
  • Check [`CREATE FILE FORMAT`](https://docs.snowflake.com/en/sql-reference/sql/create-file-format). Initialize file format, initialize tables, put csv on server and load data from files directly. You don't even need python, it can be done via command line interface. You will need to check each file anyway, because it's hardly possible to correctly obtain types of columns from every csv without manual verification. – Olvin Roght Mar 18 '23 at 20:50
  • Once you have the CSV files on the local server, you could issue a `PUT` command to the ODBC driver to an internal stage on Snowflake. Then, you can issue a `COPY` command to Snowflake to load the file into the table you are looking to load it to. I would recommend getting all of the files to the internal stage and then issuing a single `COPY` command at the end to load them all in bulk. More efficient that way. – Mike Walton Mar 18 '23 at 20:50
  • Also, if you are using Python to control the logic, then you should use the Snowflake Python connector, rather than an ODBC connector. – Mike Walton Mar 18 '23 at 20:51

1 Answers1

0

You can create a STAGE for each of the CSV files and then load them to the respective STAGEs. From the stage, you can then load them to the tables. Have one stage map to one table(source similar concept). I have quickly written a PYTHON program which may help you. The program is here

https://github.com/rajib76/sf_ops

If you use PYTHON to wrap the load logic, you may be able to automate this through a scheduler

Rajib Deb
  • 1,496
  • 11
  • 30
  • Hi Rajib, I am not able to connect to Snowflake using this connector. Hence I am using ODBC. It looks like it has limit of 64MB while copying the file. Any idea if there's a workaround? – user4645994 Mar 19 '23 at 10:32
  • I get below error while connecting to Snowflake using snowflake.connector - AttributeError: module 'lib' has no attribute 'ERR_load_RAND_strings' – user4645994 Mar 19 '23 at 10:33
  • It should not happen like that. I added one more function in the code which calls SNOWSQL from python to load the files. You will need to install SNOWSQL in your linux server before running the program. The python program is calling snowsql to PUT the file in stage. Please try this and see it it works – Rajib Deb Mar 19 '23 at 16:27