5

My project involves taking a wide array of SQL queries, converting them to dataframes in python via pandas, filtering by a user name column and then outputing the filtered dataframes to individual HTML files which are converted to PDFs and published.

I'm stuck on an issue where when the date columns appear in the HTML, they are YY-MM-DD-HH-MM-SS but i'd just like them to be YY-MM-DD to make the PDF output look cleaner.

I know that i can convert a single named column to the desired output using:

dataset['datecolumn'] = dataset['datecolumn'].dt.date

I did manage to solve the problem using the above approach by getting a list of the column names and then applying a for loop as below:

datecollist = list(dataset.select_dtypes(include=['datetime64']).columns)
for i in datecollist:
    dataset[i] = dataset[i].dt.date

However I would really like to figure out a more pythonic way of fixing this problem as I want to avoid a further for loop in my code and to improve by python ability.

So far I attempted a mapper function without any luck:

mapper = lambda x: x.dt.date if isinstance(x, np.datetime64) else x
dataset.columns = dataset.columns.map(mapper)

It just seemed to skip out the columns that I'm looking to convert. I noticed that when I check the dtype of these columns they show up as:

in: dataset['First Date'].dtype
out: dtype('<M8[ns]')

Can anyone suggest where my mapper approach is going wrong, or show me a more efficient/cleaner way of solving the problem?

Peter Connolly
  • 121
  • 1
  • 1
  • 8

1 Answers1

3

Another option, since your data is coming from SQL is to convert all date times to dates upfront, in the SQL. For example, using:

How to return only the Date from a SQL Server DateTime datatype

That would only work if you never needed the full date time in your code - i.e. all logic only used dates.

Though I too like the elegance of comprehensions (not having to write for loops!), I sometimes end up feeling they are unwieldy and hard to read, especially with conditional logic, and mapping functions. Also, they are harder error check, as they attempt to apply multiple operations all together.

In this case, a simple check and recast is what I use.

Generate some testable data:

import pandas as pd
import numpy as np
from datetime import datetime

def make_data (n):

    ts1 = datetime.now().timestamp() + abs(np.random.normal(1000, 30, n)).cumsum()
    ts2 = datetime.now().timestamp() + abs(np.random.normal(2000, 10, n)).cumsum()

    df = pd.DataFrame({
        'c1': np.random.choice(list('ABCDEFGH'), n),
        'c2': np.random.choice(list('ABCDEFGH'), n),
        'c3': np.random.choice(list('ABCDEFGH'), n),
        'i1': np.random.randint(1, 20, n),
        't1': pd.to_datetime(ts1, unit='s'),
        't2': pd.to_datetime(ts2, unit='s'),
        'x1': np.random.randn(n),
        'x2': np.random.randn(n),
        'x3': np.random.randn(n)
        })

    return df

df = make_data(100)

Then convert all date times to just the date with:

# convert datetimes to just the date
for col in df.columns:
    if df[col].dtype == 'datetime64[ns]':
        df[col] = df[col].dt.date

The is more easily turned into a generic function with error checking (try, except), or used to convert objects to category or int to float, or...

Randall Goodwin
  • 1,916
  • 2
  • 18
  • 34