0

I have a big Pandas data frame with many columns, say, 25. Now I need to create a table in the SQL Server database to store the content of the data frame. How to avoid manually describing each column parameters in the CREATE TABLE command? Is there any solution that uses the data frame columns as a template? Maybe is there some way to run pandas.to_sql, but without subsequent insertion of rows?

I don't want to insert the rows into the table using pandas.to_sql, because I have too many rows, and this command does not work. I have to export into csv and then use the bcp utility.

  • You can still use `DataFrame.to_sql`, make use of the `chunksize` parameter which exports the table in chunks to your database. See [docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html) – Erfan Feb 10 '20 at 15:08
  • you mean that if i write chunksize=0, it will create a table and insert nothing? – Serhii Kushchenko Feb 10 '20 at 15:11
  • which chunksize value is recommended? – Serhii Kushchenko Feb 10 '20 at 15:12
  • You can try that, but why would you do that, if your problem is that the data is too big to be exported in one go. Just use `chunksize=10000` for example – Erfan Feb 10 '20 at 15:12
  • Is there any way to make pandas.to_sql verbose, i.e to report 'processing row 100 from 1000000' etc.? – Serhii Kushchenko Feb 10 '20 at 15:13
  • Take a look at this you can probably iterate over the chunks and print yourself some indication of how much is processed https://stackoverflow.com/questions/25962114/how-to-read-a-6-gb-csv-file-with-pandas – animalknox Feb 10 '20 at 15:20

0 Answers0