0

I trying to load a bunch of ExcelSheets into Oracle table. The excelsheets are not consistent. So the tables in Oracle need to be dynamically generated based on the column in the Excelsheets. This is the easiest solution I found in the internet, so I went on with this approach. I'm kinda Novice in Python.

Taking the following structure as an example.

enter image description here

I'm trying to import the following excel into MEM_TABLE table. The DATE field is inconsistent, it can have NA, _ND or Date as values.

The code I've used to load the excel is below

xls_file = pd.ExcelFile("Excel File")

df =  xls_file.parse('Sheet1',parse_dates=True)
df =  df.replace(np.nan, '') #replace NaN with blank string
df = df.replace('NA', '') #replace "NA" with blank string
df = df.replace('_ND', '') #replace "NA" with blank string
df.to_sql('MEM_TABLE', conn_ora, if_exists='append', index = False )

The table that to_sql method created has following structure

MEM_TABLE(
  date    CLOB         NULL,
  emp_nbr NUMBER(19,0) NULL
)

In dataframe it reads something like this

 DATE                       EMP_NBR
0                             2
1  2018-01-04 00:00:00        1
2  2018-01-01 00:00:00        2
3  2018-01-03 00:00:00        1
4                             1
5                             1

Looks like Pandas is auto casting 1/4/2018 to date when creating the DataFrame. The error I'm getting below, which I believe is because the DATE column in table MEM_TABLE is CLOB and in dataFrame the Date field has DATE object. Looks likes the date object can't be insert into a CLOB cell.

TypeError: expecting string or bytes object

If the Dates are in string(i.e "1/4/2018"), I'm not getting any errors.

My question here is Is there anyway to disable the auto Parse on the date field, while the dataframe is getting created?

Gunjan Shakya
  • 194
  • 1
  • 10

2 Answers2

0

When reading in the pandas dataframe, use 'parse_dates' = False

df = xls_file.parse('Sheet1',parse_dates=False)

EHB
  • 1,127
  • 3
  • 13
  • 24
  • I tried it, same issue with that too. The DataFrame has the Date Object in date – Gunjan Shakya Jul 12 '19 at 13:36
  • Looks like parse_dates doesn't work. Found a link [here](https://github.com/pandas-dev/pandas/issues/10001). Not sure if it holds true now. I'd be happy if someone has a workaround for this – Gunjan Shakya Jul 15 '19 at 13:38
0

The work around I used is I created a method undate (Used from a existing solution) to revert back date to original format. And ran the method in all the DataFrame column. The Implementation is brut Force but it works :P . My final Code looks something like this.

def undate(x):
    if pd.isnull(x):
        return x
    try:
        return x.strftime('%d/%m/%Y')
    except AttributeError:
        return x
    except Exception:
        raise

for col in df.columns:
        df[col] = df[col].apply(undate)
Gunjan Shakya
  • 194
  • 1
  • 10