1

I currently use pd.read_csv to read the dataframe. It does not detect both date and datetime columns, but instead checks it out as object. So, I use the below snippet to find date/datetime columns:

df = pd.read_csv(filename)
collist = []
for col in df.columns:
    if df[col].dtype == 'object':
        try:
            df[col] = pd.to_datetime(df[col])
            collist.append(col)
        except ValueError:
            pass
return collist

But my use case needs me to have date columns and datetime columns separately. Is there a way how we can filter out date and datetime columns separately?

import pandas as pd
df = pd.DataFrame({
    'date1':['4/10/2021', '4/11/2021','4/12/2021'],
    'date_time1': ['4/11/2021 13:23:45', '4/11/2021 13:23:45', '4/11/2021 13:23:45'],
    'Name': ['Bob', 'Jane', 'Alice'],
    'date_time2': ['4/12/2021 13:23:45', '4/13/2021 13:23:45', '4/14/2021 13:23:45']
})

So, date column list should give me [date1] and datetime column list should give me [date_time1,date_time2]

user3471881
  • 2,614
  • 3
  • 18
  • 34
LearnerJS
  • 298
  • 2
  • 14
  • Can you please provide an example of input and expected output. Also, by the end of the function ^ everything will be `datetime`, since that's what overwriting columns with `pd.to_datetime()` does. – user3471881 Sep 26 '21 at 06:28
  • So, for example I have a dataframe with a column having value like: 4/11/2021 and another with 4/11/2021 12:30:34 so I want two separate list of columns. One with date and another with datetime. – LearnerJS Sep 26 '21 at 06:35
  • Please follow the examples in this thread to create a good reproducible example in pandas: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – user3471881 Sep 26 '21 at 06:37
  • @user3471881 Edited! Thanks. – LearnerJS Sep 26 '21 at 06:50
  • If you could prescribe a set of directives for parsing to date/time, I think this would be much more robust. Also note that pandas' datetime does not distinguish between date and datetime, as opposed to "native Python". – FObersteiner Sep 26 '21 at 08:00
  • @MrFuppes I do not understand it. If I read directly all these are string objects. If I use parse_dates, it converts all to timestamp. How can I distinguish in between these values? Can you suggest an approach? – LearnerJS Sep 26 '21 at 08:37
  • I mean if you expect date/time strings with certain formats (yyyy-mm-dd, mm-dd-yyyy etc.), it's easy to distinguish between strings that contain date or date and time. – FObersteiner Sep 26 '21 at 08:41

1 Answers1

1

Since you have already read the data and converted everything to datetime and were storing it in collist - (datecollist), use the below snippet to parse these timestamps and distinguish between date and datetime.

datetime_col_list = []
df = pd.read_csv(filename, delimiter=delimiter, encoding=encoding, parse_dates=date_collist)
for col in date_collist:
        first_index = df[col].first_valid_index()
        first_valid_value = df[col].loc[first_index]
        if (str(first_valid_value).split(' ')[1]) != '00:00:00':
            datetime_col_list.append(col)

date_list = list (set(date_collist) - set(datetime_col_list))
print(date_list)
print(datetime_col_list)
Shashwat
  • 462
  • 2
  • 9