0

I have some excel that has the columns Month and Year and I want to have a newly created column that converts the month and year into a datetime column of the format %d/%m/%Y with the day being the 1st of that month.

Example:

Month Year
3 2021
5 2021

The new column should look like: 01-03-2021.

Datum
01/03/2021
01/05/2021

I have tried this:

import datetime
df = pd.read_excel(r"C:\Users\f0cdnu\Downloads\Test.xlsx")
df['Datum'] = datetime.datetime(df.Year, df.Month,1)
df

Gives cant convert series to int error and this:

df = pd.read_excel(r"C:\Users\f0cdnu\Downloads\Test.xlsx")
df['Datum'] = pd.to_datetime(df.Year*10000+df.Month*100)

gives Mm

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
Fathi
  • 15
  • 7
  • A `datetime` has no format, it's a binary value. If you want to create a `datetime` instance in Python, you use the `datetime` constructor and pass the values you want. Have you tried *anything*? Reading the year and month values and calling `datetime.datetime(year,month,1)` perhaps? – Panagiotis Kanavos Jul 29 '21 at 15:15
  • Are you trying to create the new column in Excel? Why use Python then? Use a formula to construct a date and change the cell style to whatever you want. `=DATE(B2,A2,1)` will generate a date from year and month values. Dates in Excel are binary values as well, in fact their stored as doubles. Their appearance depends on the cell's style and the current user's locale – Panagiotis Kanavos Jul 29 '21 at 15:19
  • @PanagiotisKanavos I have to use python because there are other preprocessing I still need to do. I tried the datetime.datetime but I get the series cant be converted to int error – Fathi Jul 29 '21 at 15:25
  • That means you have bad data, not that `datetime` doesn't work. Or you tried to pass strings instead of numbers. Excel stores numbers as integers or decimals. Somehow the numbers were converted as strings, or stored as strings by the user, by adding a `'` in front of the number. In any case, post your code and any errors you've encountered. People can't help fix code they can't see or errors they don't know about – Panagiotis Kanavos Jul 29 '21 at 15:35
  • @PanagiotisKanavos I dont believe its bad data, the columns are stored as integers, i just need a way to combine them and make them be in the form d/m/y in a new column – Fathi Jul 30 '21 at 07:09
  • I already answered that. The error you posted is completely different. It says `cant convert series to int error`. `df.Year` is the Year column, not a single year. As for formats, `datetime` has no format. Formats come into play only when you export or display the dataframe. If you export it back to Excel, you'll have to set the style of the date column – Panagiotis Kanavos Jul 30 '21 at 07:11
  • The answers to this probably duplicate question show many different ways of adding a Pandas date column built from other columns [How to convert columns into one datetime column in pandas?](https://stackoverflow.com/questions/19350806/how-to-convert-columns-into-one-datetime-column-in-pandas) – Panagiotis Kanavos Jul 30 '21 at 07:18
  • Another possible duplicate is [Python Pandas custom time format in Excel output](https://stackoverflow.com/questions/25898839/python-pandas-custom-time-format-in-excel-output). As the accepted answer shows, you can specify formats through the `datetime_format` property of the `ExcelWriter` object. – Panagiotis Kanavos Jul 30 '21 at 07:23
  • @PanagiotisKanavos Thanks, tried something from the first link https://stackoverflow.com/questions/19350806/how-to-convert-columns-into-one-datetime-column-in-pandas and it worked. I guess you should reference that the question has been answered there, i dont know how to do that – Fathi Jul 30 '21 at 08:10

1 Answers1

2

A datetime in both Python and Excel has no format. In both cases it's a binary value. In Excel dates are stored as OLE Automation DATE values, also known as OADAte - a 64-bit float where the integral part is an offset from Decemberr 30, 1899 and the fractional part the time of day.

Formats apply only when parsing text into dates or formating dates to strings for export or display. In Excel, the display of a date depends on the cell's style.

Since you use Pandas, the problem becomes:

  1. how to create a new datetime column from parts and
  2. how to control the cell style when exporting to Excel

Create the column

The answers to How to convert columns into one datetime column in pandas? show several ways to add the new column. The most elegant would be :

df['Datum'] = pd.to_datetime(dict(year=df.Year, month=df.Month, day=1))

or

df['Datum'] =pd.to_datetime(df.Year*10000+df.Month*100+1,format='%Y%m%d')

In this case the number is treated as string parsed using the format parameter.

Specify a date format in Excel

The answers to Python Pandas custom time format in Excel output show how to control the display style of datetime columns through the datetime_format property of the ExcelWriter object :

writer = pd.ExcelWriter("time.xlsx",  datetime_format='dd/mm/yyyy')
df.to_excel(writer, "Sheet1")

Pandas uses XlsxWriter to write to Excel. Working with Dates and Time how XlsxWriter works with dates in general and Working with Python Pandas and XlsxWriter how to work with Pandas and how to control formatting.

For example, you can set the default date and time styles for all cells:

writer = pd.ExcelWriter("pandas_datetime.xlsx",
                        engine='xlsxwriter',
                        datetime_format='mmm d yyyy hh:mm:ss',
                        date_format='mmmm dd yyyy')

Or you can specify formats for specific ranges :

# Add some cell formats.
format1 = workbook.add_format({'num_format': '#,##0.00'})
format2 = workbook.add_format({'num_format': '0%'})

# Set the column width and format.
worksheet.set_column('B:B', 18, format1)

# Set the format but not the column width.
worksheet.set_column('C:C', None, format2)
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236