5

i need to read a excel file without changing any date , time format , float format and convert to data-frame. This is working fine if i convert the excel to CSV and read it using read_csv() .

eg:

import pandas as pd
import numpy as np
#code for reading excel
df=pd.read_excel("605.xlsx",parse_dates=False,sheet_name="Group 1",keep_default_na=False,dtype=str)
print("df_excel:")
#code for reading csv   

df1=pd.read_csv("Group 1.csv",parse_dates=False,dtype=str,na_filter = False)
 print("df_csv:",df1)

output: enter image description here

in the above code parse_dates=False is working fine while reading CSV file, but parse_dates=False is not working in read_excel()

Expected output: Need the exact excel data into a data-frame without changing the date , time format.

FObersteiner
  • 22,500
  • 8
  • 42
  • 72
prabhaahar
  • 93
  • 1
  • 6
  • What version of pandas are you running? – JAV Jul 24 '20 at 06:31
  • pandas version -> 1.0.4 – prabhaahar Jul 24 '20 at 08:30
  • 1
    see my answer [here](https://stackoverflow.com/a/70884740/10197418) - If cells are formatted a certain way in Excel (e.g. as date), Python will adopt that format, by using the appropriate type. `pandas` does some "magic" in-between (which I don't find useful), but in any case, you'll either have to change format in Excel, or save as `csv` (w/o any such "format"), or do post-processing in Python/pandas. – FObersteiner Jan 28 '22 at 09:36

1 Answers1

0

From the Pandas docs on the parse_dates parameter for read_excel():

If a column or index contains an unparseable date, the entire column or index will be returned unaltered as an object data type. If you don`t want to parse some cells as date just change their type in Excel to “Text”.

You could try this:

df = pd.read_excel("605.xlsx",parse_dates=False,sheet_name="Group1",keep_default_na=False,dtype=str, converters={'as_at_date': str})

Explicitly converting the date column to string might help.

lok1
  • 66
  • 5
  • i tried with the above line of code , still the output date value is auto parsed. – prabhaahar Jul 24 '20 at 08:39
  • What about if you pass in dtype={'as_at_date': object} ? Alternatively, try object instead of str in the converter parameter? – lok1 Jul 24 '20 at 08:49
  • i tried with the below line , still the the date is auto parsed. df = pd.read_excel("6054.xlsx",parse_dates=False,sheet_name="Group 1 Attribute",keep_default_na=False,converters={'as_at_date': str},dtype={'as_at_date': object}) – prabhaahar Jul 24 '20 at 09:07
  • Forgot to mention that you should remove the converters parameter if you use "dtype={'as_at_date': object}. You could also try to pass in object instead of str in the converters parameter also – lok1 Jul 24 '20 at 09:13
  • i tried both the ways. 1) changing dtype: df = pd.read_excel("6054.xlsx",parse_dates=False,sheet_name="Group 1 Attribute",keep_default_na=False,dtype={'as_at_date': object}) . Still the data format is parsed. 2) Object in converter parameter : df = pd.read_excel("6054.xlsx",parse_dates=False,sheet_name="Group 1 Attribute",keep_default_na=False,converters={'as_at_date': object}) this line is throwing below error : – prabhaahar Jul 24 '20 at 09:27
  • TypeError: object() takes no arguments – prabhaahar Jul 24 '20 at 09:28
  • Very weird, dtype={'as_at_date': object} should preserve the column as its stored in the excel file. I don't know if the parameter order affects anything, i have no file i can test this on. I guess you could try removing all parameters except filename, sheetname and dtype, if that dosen't work I dno – lok1 Jul 24 '20 at 09:45
  • even after removing the other parameter the date value is getting converted. can you please provide me your e-mail id , so that i can send you the input file – prabhaahar Jul 27 '20 at 06:27