I have a column of a dataframe that is like this
time
0 2017-03-01 15:30:00
1 2017-03-01 16:00:00
2 2017-03-01 16:30:00
3 2017-03-01 17:00:00
4 2017-03-01 17:30:00
5 2017-03-01 18:00:00
6 2017-03-01 18:30:00
7 2017-03-01 19:00:00
8 2017-03-01 19:30:00
9 2017-03-01 20:00:00
10 2017-03-01 20:30:00
11 2017-03-01 21:00:00
12 2017-03-01 21:30:00
13 2017-03-01 22:00:00
.
.
.
I want to "encode" the time of the day. I want to do this by firsly assigning each half an-hour a integer number. Starting from
00:30:00 --> 1
01:00:00 --> 2
01:30:00 --> 3
02:00:00 --> 4
02:30:00 --> 5
and so on. Therefore we would have 48
numbers (since there are 24 hours). I would like to find the fastest way of transforming my column into a list/column containing those values.
So far I can do this for one value. For instance
2*int(timeDF.iloc[0][11:13]) + int(int(timeDF.iloc[0][14:16])/30)
would transform 15:30:00
into 31
.
I think I could do this by doing a loop where instead of using 0
I use an index that loops through the length of the column. However is there a faster way?
one hot encoding
After finding those values, I would use some one-hot-encoder, I think sklearn has one. But the most difficult part is this
stupid solution
labels = []
for date in time:
labels.append(2*int(date[11:13]) + int(int(date[14:16])/30))
This would contain the values and then one could do something like here