0

I have a column named "date_time" which has date and time stamp of some year, I want to find which part of the day does this time fall into, like morning, noon, evening and night for extracting the features as below :

if date_time.dt.hour >=5 and new_data.current_date_time.dt.hour <12 --> then it's morning if date_time.dt.hour >=12 and new_data.current_date_time.dt.hour<17 --> then noon if date_time.dt.hour >=17 and new_data.current_date_time.dt.hour<20 --. then evening else night.

But I'm unable to filter as above using .dt.hour attribute of the pandas to_datetime datatype, please help me in achieving this.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • Question has nothing to do with `machine-learning` - kindly do not spam irrelevant tags (removed & replaced with `pandas`). – desertnaut Nov 05 '19 at 10:56

2 Answers2

0

You should get the datetime from timstamp, there are multiple ways to do this, you can refer this Converting between datetime and Pandas Timestamp objects.

Once you have time object like in HH, MM, SS etc. you can use your logic for getting morning, noon, evening, night. One point here, standard is to get AM or PM or in 24HH time, if you want to say Morning or Evening you have to match time based on your condition, not searching for direct method. Again its my opinion. Specifically if i answer:

  • Step-1 Parse the timestamp into string, get object something like (YY-MM-DD HH:MM:SS)
  • Step-2 Extract HH and MM from string object
  • Step-3 Perform your logic by casting these string into number
Shubham Chauhan
  • 119
  • 2
  • 14
0

This is an addition to the Shubham's answer. I assume, following what he described, you are able to extract time-stamps properly.

Further, depending on your use-case, you may want to make changes to the column itself, or you could add one more column to store these values. Let me explain the process using the later. To run through an exemplar code, let me call this column as part_of_day. This can be done as

df["C"] = ""

Now, you will have to start add values to the column based on certain conditions. The general syntax for this would be.

df.loc[<mask to generate the labels to index> , <optional column(s)>] = <some value>

For your case, one of the conditions may be look like

df.loc[5 <= df.date_time.hour and df.date_time.hour <=12 , "part_of_day"] = "morning"
Anant Mittal
  • 1,923
  • 9
  • 15
  • Hi Anand, Thanks for your help. I have an existing column called "current_date_time" of datetime64[ns] datatype, I took only hours from the previous column and derived a new column "hours", after which I derived column "part_of_day" based on my conditions using if-elif construct. Let me know your thoughts on the same, and hope I can proceed further for ML modelling by dropping "current_date_time" and "hours" columns. – user11720648 Nov 06 '19 at 05:20
  • HI @user11720648, glad that you found my answer help. While your solution is logically current, I still won't suggest you to go with the if-else construct. This can be super slow for bigger data-sets. It's good that you have created an hour column, but I would suggest that you use the `df.loc()` construct that I have suggested. All the best, and do accept the solution if it works for you. :) – Anant Mittal Nov 06 '19 at 05:27
  • Hi Anand, I did try as per your suggestion as --> t['hour'].loc[t['hour']>=5 and t['hour']<12,"part_of_day"]='Morning' (or) --> t.loc[t.current_date_time.dt.hour >=5 and t.current_date_time.dt.hour <12 , "part_of_day"] = "Morning", but I'm getting "The truth value of a Series is ambiguous" error, Is there any fix for this error ? – user11720648 Nov 07 '19 at 02:40
  • I see two statements, can you properly tell which one you tried? – Anant Mittal Nov 07 '19 at 04:28
  • I tried both of the statements, but the error is same(The truth value of a Series is ambiguous) – user11720648 Nov 07 '19 at 14:10