0
IPNI_RNC PATHID 2020-11-11 00:00:00 2020-11-12 00:00:00 2020-11-13 00:00:00 2020-11-14 00:00:00 2020-11-15 00:00:00 2020-11-16 00:00:00 2020-11-17 00:00:00 Last Day Violation Count

Above are the columns label after reading the excel file. There are 10 columns in df variable after reading the excel and 7 of the columns label are date.

My input data set is an excel file which changes everyday and I want to update it automatically. In excel, some columns label are date like 11-Nov-2020, 12-Nov-2020 but after reading the excel it becomes like 2020-11-11 00:00:00, 2020-11-12 00:00:00. I want to keep column labels as 11-Nov-2020, 12-Nov-2020 while reading excel by pd.read_excel if possible or I need to convert it later.

I am very new in python. Looking forward for your support Thanks who have already came forward to cooperate me

  • Hi, could you please check the first line? it looks like it needs some reformatting, If you want to have it as a list of lines, you could use the {} button to reformat it as code. – jottbe Nov 23 '20 at 11:04

2 Answers2

0

strftime of the datetime module is the function you need:

If datetime is a datetime object, you can do

datetime.strftime("%d-%b-%Y")

Example:

>>> from datetime import datetime
>>> timestamp = 1528797322
>>> date_time = datetime.fromtimestamp(timestamp)
>>> print(date_time) 
2018-06-12 11:55:22
>>> print(date_time.strftime("%d-%b-%Y"))
12-Jun-2018

In order to apply a function to certain dataframe columns, use:

datetime_cols_list = ['datetime_col1', 'datetime_col2', ...]
for col in dataframe.columns:
    if col in datetime_cols_list:
        dataframe[col] = dataframe[col].apply(lambda x: x.strftime("%d-%b-%Y"))

I am sure this can be done in multiple ways in pandas, this is just what came out the top of my head.

Example:

import pandas as pd
import numpy as np

np.random.seed(0)
# generate some random datetime values
rng = pd.date_range('2015-02-24', periods=5, freq='T')
other_dt_col = rng = pd.date_range('2016-02-24', periods=5, freq='T')
df = pd.DataFrame({ 'Date': rng, 'Date2': other_dt_col,'Val': np.random.randn(len(rng)) }) 

print (df)
# Output:
#   Date                Date2                Val
# 0 2016-02-24 00:00:00 2016-02-24 00:00:00  1.764052
# 1 2016-02-24 00:01:00 2016-02-24 00:01:00  0.400157
# 2 2016-02-24 00:02:00 2016-02-24 00:02:00  0.978738
# 3 2016-02-24 00:03:00 2016-02-24 00:03:00  2.240893
# 4 2016-02-24 00:04:00 2016-02-24 00:04:00  1.867558

datetime_cols_list = ['Date', 'Date2']
for col in df.columns:
    if col in datetime_cols_list:
        df[col] = df[col].apply(lambda x: x.strftime("%d-%b-%Y"))

print (df)
# Output:
#    Date         Date2        Val
# 0  24-Feb-2016  24-Feb-2016  1.764052
# 1  24-Feb-2016  24-Feb-2016  0.400157
# 2  24-Feb-2016  24-Feb-2016  0.978738
# 3  24-Feb-2016  24-Feb-2016  2.240893
# 4  24-Feb-2016  24-Feb-2016  1.867558
liakoyras
  • 1,101
  • 12
  • 27
0

You can of course use the standard python methods to parse the date values, but I would not recommend it, because this way you end up with python datetime objects and not with the pandas representation of dates. That means, it consumes more space, is probably not as efficient and you can't use the pandas methods to access e.g. the year. I'll show you, what I mean below. In case you want to avoid the naming issue of your column names, you might want to try to prevent pandas to automatically assign the names and read the first line as data to fix it yourselfe automatically (see the section below about how you can do it).

The type conversion part:

# create a test setup with a small dataframe
import pandas as pd
from datetime import date, datetime, timedelta
df= pd.DataFrame(dict(id=range(10), date_string=[str(datetime.now()+ timedelta(days=d)) for d in range(10)]))

# test the python way:
df['date_val_python']= df['date_string'].map(lambda dt: str(dt))

# use the pandas way: (btw. if you want to explicitely 
# specify the format, you can use the format= keyword)
df['date_val_pandas']= pd.to_datetime(df['date_string'])
df.dtypes

The output is:

id                          int64
date_string                object
date_val_python            object
date_val_pandas    datetime64[ns]
dtype: object

As you can see date_val has type object, this is because it contains python objects of class datetime while date_val_pandas uses the internal datetime representation of pandas. You can now try:

df['date_val_pandas'].dt.year
# this will return a series with the year part of the date

df['date_val_python'].dt.year
# this will result in the following error:
AttributeError: Can only use .dt accessor with datetimelike values

See the pandas doc for to_datetime for more details.

The column naming part:

# read your dataframe as usual
df= pd.read_excel('c:/scratch/tmp/dates.xlsx')
rename_dict= dict()
for old_name in df.columns:
    if hasattr(old_name, 'strftime'):
        new_name= old_name.strftime('DD-MMM-YYYY')
        rename_dict[old_name]= new_name
if len(rename_dict) > 0:
    df.rename(columns=rename_dict, inplace=True)

This works, in case your column titles are stored as usual dates, which I suppose is true, because you get a time part after importing them.

jottbe
  • 4,228
  • 1
  • 15
  • 31