11

I read a large Excel file into pandas using .read_excel, and the file has date columns. When read into pandas, the dates default to a timestamp. Since the file is large, I would like to read the dates as a string.

If that is not possible, then I would at least like to export the date back to Excel in the same format as it is in the original file (e.g. "8/18/2009").

My two questions are:

  1. Can I avoid converting the Excel date into a timestamp in pandas?
  2. If not possible, how can I write back the date in the original format efficiently?
Daniel Walker
  • 6,380
  • 5
  • 22
  • 45
user18101
  • 626
  • 3
  • 13
  • 22
  • 1
    "When read into pandas the date defaults to a timestamp or, at least, when I export it back to Excel." Which of the two is it? – IanS Feb 23 '16 at 13:46
  • According to the comments in this question, there is no way to avoid converting Excel dates into timestamps: http://stackoverflow.com/questions/34156830/leave-dates-as-strings-using-read-excel-function-from-pandas-in-python – IanS Feb 23 '16 at 13:51
  • You could try this: http://stackoverflow.com/a/28769537/5276797 – IanS Feb 23 '16 at 13:52
  • The code "f.write(vbscript.encode('utf-8'))" from the third comment doesn't work in python 3. I put it in the 2to3 converter and it didn't make changes. Any suggestions? – user18101 Feb 23 '16 at 17:47
  • What is the error message? – IanS Feb 23 '16 at 20:11
  • It wasn't in binary, I changed "f = open('ExcelToCsv.vbs','w')" to "f = open('ExcelToCsv.vbs','wb')" – user18101 Feb 23 '16 at 20:25
  • You could try to ask the author of the answer, by adding a comment to his answer. This is outside of my area of expertise unfortunately. – IanS Feb 23 '16 at 20:32
  • The problem is that Excel doesn't store dates as strings, it stores them as numbers with a special format code. – Mark Ransom Mar 09 '21 at 01:13

3 Answers3

3
  1. I am not sure how to read the date and not convert into timestamp using read_excel.
  2. Because the date is already converted into datetime while reading it into a dataframe, here is how the date can be printed in the original format - I have used 'mm/dd/yyyy'.
import pandas as pd

df = pd.read_excel(
    "file_to_read.xlsx",
    sheet_name="sheetname",
)
writer = pd.ExcelWriter(
    "file_to_write.xlsx",
    engine="xlsxwriter",
    datetime_format="mm/dd/yyyy",
)
df.to_excel(
    writer,
    index=False,
    header=True,
    sheet_name="sheetname",
)
Jack Deeth
  • 3,062
  • 3
  • 24
  • 39
1

this is similar as issue here. Leave dates as strings using read_excel function from pandas in python

check the answers:

  • Using converters{'Date': str} option inside the pandas.read_excel which helps.
    pandas.read_excel(xlsx, sheet, converters={'Date': str})
  • you can try convert your timestamp back to the original format
    df['Date'][0].strftime('%Y/%m/%d')
Community
  • 1
  • 1
YDD9
  • 135
  • 2
  • 10
0

I had the same problem. This is what solved the issue for me:

df = pd.read_excel(excel_link, sheet_name, dtype=str)

If you don't mind converting the df or entire column to string