2

I'm wondering if I could use pandas to guess what the type of each column is. For example, if I have the following string passsed to pandas:

("store_location_1", 1.23, 1, 2015-02-02)

Is there any way pandas could guess that the types for the columns are VARCHAR, FLOAT, INTEGER, and DATETIME, respectively?

My end goal is to use pandas to correctly create a schema given a CSV header. The example string above could be the first data row in a large CSV file that I need to import into a PostgreSQL database. But in order to do this, I have to create a table first, and hence, I'd need a table schema first.

Thank you for your suggestions!

joris
  • 133,120
  • 36
  • 247
  • 202
LTran
  • 131
  • 1
  • 7
  • Have a look at http://stackoverflow.com/questions/15891038/pandas-change-data-type-of-columns. One issue -- because Pandas Dataframes are a collection of columns, one Series per column, a column can only be converted to numeric or date types if all the rows conform or there is a default value for non-conforming rows. – Paul Oct 20 '15 at 22:37
  • Also, the conversion problem has been solved before, see [How to insert pandas dataframe into mysql](http://stackoverflow.com/questions/16476413/how-to-insert-pandas-dataframe-via-mysqldb-into-database) – Paul Oct 20 '15 at 22:44
  • Hey Paul, thanks for the quick reply. I'm working with really large CSV files and want to know if pandas' `to_sql` can efficiently handle gigabytes worth of CSV? Right now I'm using Psycops' (http://initd.org/psycopg/docs/cursor.html) `copy_expert` function and it's working out pretty well. – LTran Oct 20 '15 at 22:51
  • @LTran If you have gigabytes of data, you probably don't want to pass your data through python (the conversion of the data to python objects and inserting them with psycopg2 (which pandas and sqlalchemy use under the hood) is rather slow). So I would stick with the `COPY` approach if that is working well. – joris Oct 21 '15 at 08:02
  • @LTran I seem to recall pandas loads the entire CSV into memory. So depending on the size of a multi-GB file, you may run out of memory. Amazon EC2 has hourly rentals of VMs with lots of memory (look for type R3). – Paul Oct 21 '15 at 14:31

0 Answers0