How to set datetime format for pandas dataframe column labels?

Question

IPNI_RNC PATHID 2020-11-11 00:00:00 2020-11-12 00:00:00 2020-11-13 00:00:00 2020-11-14 00:00:00 2020-11-15 00:00:00 2020-11-16 00:00:00 2020-11-17 00:00:00 Last Day Violation Count

Above are the columns label after reading the excel file. There are 10 columns in df variable after reading the excel and 7 of the columns label are date.

My input data set is an excel file which changes everyday and I want to update it automatically. In excel, some columns label are date like 11-Nov-2020, 12-Nov-2020 but after reading the excel it becomes like 2020-11-11 00:00:00, 2020-11-12 00:00:00. I want to keep column labels as 11-Nov-2020, 12-Nov-2020 while reading excel by pd.read_excel if possible or I need to convert it later.

I am very new in python. Looking forward for your support Thanks who have already came forward to cooperate me

Hi, could you please check the first line? it looks like it needs some reformatting, If you want to have it as a list of lines, you could use the {} button to reformat it as code. — jottbe, Nov 23 '20 at 11:04

liakoyras · Answer 1 · 2020-11-23T09:21:35.730

strftime of the datetime module is the function you need:

If datetime is a datetime object, you can do

datetime.strftime("%d-%b-%Y")

Example:

>>> from datetime import datetime
>>> timestamp = 1528797322
>>> date_time = datetime.fromtimestamp(timestamp)
>>> print(date_time) 
2018-06-12 11:55:22
>>> print(date_time.strftime("%d-%b-%Y"))
12-Jun-2018

In order to apply a function to certain dataframe columns, use:

datetime_cols_list = ['datetime_col1', 'datetime_col2', ...]
for col in dataframe.columns:
    if col in datetime_cols_list:
        dataframe[col] = dataframe[col].apply(lambda x: x.strftime("%d-%b-%Y"))

I am sure this can be done in multiple ways in pandas, this is just what came out the top of my head.

Example:

import pandas as pd
import numpy as np

np.random.seed(0)
# generate some random datetime values
rng = pd.date_range('2015-02-24', periods=5, freq='T')
other_dt_col = rng = pd.date_range('2016-02-24', periods=5, freq='T')
df = pd.DataFrame({ 'Date': rng, 'Date2': other_dt_col,'Val': np.random.randn(len(rng)) }) 

print (df)
# Output:
#   Date                Date2                Val
# 0 2016-02-24 00:00:00 2016-02-24 00:00:00  1.764052
# 1 2016-02-24 00:01:00 2016-02-24 00:01:00  0.400157
# 2 2016-02-24 00:02:00 2016-02-24 00:02:00  0.978738
# 3 2016-02-24 00:03:00 2016-02-24 00:03:00  2.240893
# 4 2016-02-24 00:04:00 2016-02-24 00:04:00  1.867558

datetime_cols_list = ['Date', 'Date2']
for col in df.columns:
    if col in datetime_cols_list:
        df[col] = df[col].apply(lambda x: x.strftime("%d-%b-%Y"))

print (df)
# Output:
#    Date         Date2        Val
# 0  24-Feb-2016  24-Feb-2016  1.764052
# 1  24-Feb-2016  24-Feb-2016  0.400157
# 2  24-Feb-2016  24-Feb-2016  0.978738
# 3  24-Feb-2016  24-Feb-2016  2.240893
# 4  24-Feb-2016  24-Feb-2016  1.867558

Use `.dt.strftime("%d-%b-%Y")` instead `.apply(lambda x: x.strftime("%d-%b-%Y"))` — jezrael, Nov 23 '20 at 11:06

jottbe · Accepted Answer · 2020-11-23T17:11:35.983

You can of course use the standard python methods to parse the date values, but I would not recommend it, because this way you end up with python datetime objects and not with the pandas representation of dates. That means, it consumes more space, is probably not as efficient and you can't use the pandas methods to access e.g. the year. I'll show you, what I mean below. In case you want to avoid the naming issue of your column names, you might want to try to prevent pandas to automatically assign the names and read the first line as data to fix it yourselfe automatically (see the section below about how you can do it).

The type conversion part:

# create a test setup with a small dataframe
import pandas as pd
from datetime import date, datetime, timedelta
df= pd.DataFrame(dict(id=range(10), date_string=[str(datetime.now()+ timedelta(days=d)) for d in range(10)]))

# test the python way:
df['date_val_python']= df['date_string'].map(lambda dt: str(dt))

# use the pandas way: (btw. if you want to explicitely 
# specify the format, you can use the format= keyword)
df['date_val_pandas']= pd.to_datetime(df['date_string'])
df.dtypes

The output is:

id                          int64
date_string                object
date_val_python            object
date_val_pandas    datetime64[ns]
dtype: object

As you can see date_val has type object, this is because it contains python objects of class datetime while date_val_pandas uses the internal datetime representation of pandas. You can now try:

df['date_val_pandas'].dt.year
# this will return a series with the year part of the date

df['date_val_python'].dt.year
# this will result in the following error:
AttributeError: Can only use .dt accessor with datetimelike values

See the pandas doc for to_datetime for more details.

The column naming part:

# read your dataframe as usual
df= pd.read_excel('c:/scratch/tmp/dates.xlsx')
rename_dict= dict()
for old_name in df.columns:
    if hasattr(old_name, 'strftime'):
        new_name= old_name.strftime('DD-MMM-YYYY')
        rename_dict[old_name]= new_name
if len(rename_dict) > 0:
    df.rename(columns=rename_dict, inplace=True)

This works, in case your column titles are stored as usual dates, which I suppose is true, because you get a time part after importing them.

@liakoyras, thank you I think now it should include all aspects of the question. — jottbe, Nov 23 '20 at 17:12
You're welcome. If you like you can mark it as an answer then. — jottbe, Nov 24 '20 at 10:04

How to set datetime format for pandas dataframe column labels?

2 Answers2