0

I'm reading in Excel files into pandas and then attempting to insert the values into postgres db. I've converted the data into python dates using the following code:

date_val = 42565.0
year, month, day, hour, minute, sec = xlrd.xldate_as_tuple(date_val, 0)
py_date = "%02d.%02d.%04d" % (day, month, year)

I haven't though been able to find a solution for passing empty values in the date columns to postgres using (psycopg2), python 2.7.12. Some of the values in the columns are empty and pd.read_excel is parsing those values to NaN. The error I am getting when trying to insert the data to postres db is:

<class 'sqlalchemy.exc.ProgrammingError'>, ProgrammingError('(psycopg2.ProgrammingError) column "created_at" is of type date but expression is of type double precision\nLINE 1: ...at, 0.0, \'NaN\'::float, 9410, 921, 111, \'NaN\'::flo...\n  

I have tried to fill in those cells with empty string, zero as integer or string but I recieve an error everytime.

df['created_at'].fillna()

Any ideas how I should parse these values?

Thank you.

Jasen
  • 11,837
  • 2
  • 30
  • 48
OAK
  • 2,994
  • 9
  • 36
  • 49
  • 1
    What is the data type of `df['created_at']`? If it's `object`, then `fillna(None)` should work. If it's a datetime column, you may have to pick some date (e.g., `1970-01-01`) for these values (I haven't tried, `NaT` might be mapped to NULL). – cco Jan 09 '17 at 19:41
  • I'd say just convert numbers you get from excel to integers and ask postgresql to add them to `'1899-12-31'::date` , but I'm guessing sqlalchemy might make that hard. – Jasen Jan 09 '17 at 20:17
  • @cco thanks for your input. It is a date and so I will probably take your suggestion to replce the empty values with a fake date. It appears that might be the solution but now I will have to convert values of the different columns to the required data dtype using the `converters` argument in `pd.read_excel` as the types have been converted to int64, object and float64 causing an additional conflict in data types between python and the db. – OAK Jan 10 '17 at 15:03

1 Answers1

0

Excel stores dates as numeric values. You need to convert them to datetime values, see How do I read a date in Excel format in Python? for a good discussion and solution.

Community
  • 1
  • 1
cco
  • 5,873
  • 1
  • 16
  • 21
  • Thanks for this. I'll update the post in a minute to how I convert it to date but I'm getting still an error which I am unsure of the origin of it. – OAK Jan 09 '17 at 17:00
  • Is there a way of finding out which column or data field is responsible for generating this error `(, DataError('(psycopg2.DataError) integer out of range\n',)`? I began this development using the `pd.read_excel` module and now I am trying the `pd.read_csv` module as I have been reading that it is possible to control the data types as opposed to the first approach which does not yet have the ability to set a dtype on the columns.Not sure how to debug this error. I am wondering whether this is a config issue with the schema or something related to pd conversion – OAK Jan 10 '17 at 18:23