How do we detect date and datetime columns separately from a dataframe?

Question

I currently use pd.read_csv to read the dataframe. It does not detect both date and datetime columns, but instead checks it out as object. So, I use the below snippet to find date/datetime columns:

df = pd.read_csv(filename)
collist = []
for col in df.columns:
    if df[col].dtype == 'object':
        try:
            df[col] = pd.to_datetime(df[col])
            collist.append(col)
        except ValueError:
            pass
return collist

But my use case needs me to have date columns and datetime columns separately. Is there a way how we can filter out date and datetime columns separately?

import pandas as pd
df = pd.DataFrame({
    'date1':['4/10/2021', '4/11/2021','4/12/2021'],
    'date_time1': ['4/11/2021 13:23:45', '4/11/2021 13:23:45', '4/11/2021 13:23:45'],
    'Name': ['Bob', 'Jane', 'Alice'],
    'date_time2': ['4/12/2021 13:23:45', '4/13/2021 13:23:45', '4/14/2021 13:23:45']
})

So, date column list should give me [date1] and datetime column list should give me [date_time1,date_time2]

Can you please provide an example of input and expected output. Also, by the end of the function ^ everything will be `datetime`, since that's what overwriting columns with `pd.to_datetime()` does. — user3471881, Sep 26 '21 at 06:28
So, for example I have a dataframe with a column having value like: 4/11/2021 and another with 4/11/2021 12:30:34 so I want two separate list of columns. One with date and another with datetime. — LearnerJS, Sep 26 '21 at 06:35
Please follow the examples in this thread to create a good reproducible example in pandas: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — user3471881, Sep 26 '21 at 06:37
If you could prescribe a set of directives for parsing to date/time, I think this would be much more robust. Also note that pandas' datetime does not distinguish between date and datetime, as opposed to "native Python". — FObersteiner, Sep 26 '21 at 08:00
@MrFuppes I do not understand it. If I read directly all these are string objects. If I use parse_dates, it converts all to timestamp. How can I distinguish in between these values? Can you suggest an approach? — LearnerJS, Sep 26 '21 at 08:37
I mean if you expect date/time strings with certain formats (yyyy-mm-dd, mm-dd-yyyy etc.), it's easy to distinguish between strings that contain date or date and time. — FObersteiner, Sep 26 '21 at 08:41

score 1 · Answer 1 · answered Sep 26 '21 at 09:09

1

Since you have already read the data and converted everything to datetime and were storing it in collist - (datecollist), use the below snippet to parse these timestamps and distinguish between date and datetime.

datetime_col_list = []
df = pd.read_csv(filename, delimiter=delimiter, encoding=encoding, parse_dates=date_collist)
for col in date_collist:
        first_index = df[col].first_valid_index()
        first_valid_value = df[col].loc[first_index]
        if (str(first_valid_value).split(' ')[1]) != '00:00:00':
            datetime_col_list.append(col)

date_list = list (set(date_collist) - set(datetime_col_list))
print(date_list)
print(datetime_col_list)

answered Sep 26 '21 at 09:09

Shashwat

462
2
9

Yes, it works but Will it work all the time? – LearnerJS Sep 26 '21 at 09:11
I don't think this is reliable; what if the time just happens to be `00:00:00`? you would incorrectly classify as date despite the possibility that the rest of the column could have different times – FObersteiner Sep 26 '21 at 14:11
I thought even you wrote the same logic @MrFuppes, didn't you? – LearnerJS Sep 26 '21 at 18:52
@LearnerJS, I check *all* elements (`.all()`), not just the first valid – FObersteiner Sep 26 '21 at 19:31

How do we detect date and datetime columns separately from a dataframe?

1 Answers1