1

Hi I have a data frame that looks like this. Based on the same datetime, I need to keep the last row as 1 and replace the remaining ones as 0. Is there anyway for me to do this?

DateTimeStarted          Value
0   2020-12-19 16:00:00  1
1   2020-12-19 16:00:00  1
2   2020-12-19 16:00:00  1
3   2020-12-19 16:00:00  1
4   2020-12-19 16:00:00  1
5   2020-12-19 16:00:00  1
6   2020-12-19 16:00:00  1
7   2020-12-19 16:00:00  1
8   2020-12-19 16:00:00  1
9   2020-12-19 16:00:00  1
10  2020-12-19 16:00:00  1
11  2020-12-19 16:00:00  1
12  2020-12-19 16:00:00  1
13  2020-12-19 16:00:00  1
14  2020-12-19 16:00:00  1
15  2020-12-19 16:00:00  1
16  2020-12-19 16:00:00  1
17  2020-12-19 16:00:00  1
18  2020-12-19 16:00:00  1
19  2020-12-26 18:00:00  1
20  2020-12-26 18:00:00  1
21  2020-12-27 13:00:00  0
22  2020-12-27 14:00:00  0
23  2020-12-27 15:00:00  0
24  2020-12-27 15:00:00  0
25  2020-12-27 17:00:00  0

The solution should look like this. The values 0 should also remained unchanged.

DateTimeStarted          Value
0   2020-12-19 16:00:00  0
1   2020-12-19 16:00:00  0
2   2020-12-19 16:00:00  0
3   2020-12-19 16:00:00  0
4   2020-12-19 16:00:00  0
5   2020-12-19 16:00:00  0
6   2020-12-19 16:00:00  0
7   2020-12-19 16:00:00  0
8   2020-12-19 16:00:00  0
9   2020-12-19 16:00:00  0
10  2020-12-19 16:00:00  0
11  2020-12-19 16:00:00  0
12  2020-12-19 16:00:00  0
13  2020-12-19 16:00:00  0
14  2020-12-19 16:00:00  0
15  2020-12-19 16:00:00  0
16  2020-12-19 16:00:00  0
17  2020-12-19 16:00:00  0
18  2020-12-19 16:00:00  1
19  2020-12-26 18:00:00  0
20  2020-12-26 18:00:00  1
21  2020-12-27 13:00:00  0
22  2020-12-27 14:00:00  0
23  2020-12-27 15:00:00  0
24  2020-12-27 15:00:00  0
25  2020-12-27 17:00:00  0
DDM
  • 303
  • 4
  • 19

2 Answers2

2

Try this:

((~df.DateTimeStarted.duplicated(keep='last')) & (df.Value.ne(0))).astype(int)

Output:

0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     0
9     0
10    0
11    0
12    0
13    0
14    0
15    0
16    0
17    0
18    1
19    0
20    1
21    0
22    0
23    0
24    0
25    0
dtype: int32
ashkangh
  • 1,594
  • 1
  • 6
  • 9
  • Hi is there any advantage over apply method if we use the factorize method? – DDM Mar 30 '21 at 16:57
  • There is nothing wrong with using`apply` method, but `apply` is almost like using `for` loop in dataframes which is slow and not efficient. But by using numpy and pandas methods and function, you are enjoying vectorizing calculations. Please check [this](https://stackoverflow.com/questions/54432583/when-should-i-not-want-to-use-pandas-apply-in-my-code) for more info. – ashkangh Mar 30 '21 at 17:02
  • are you sure that your `factorize()` method is faster than `apply()` method? – Anurag Dabas Mar 30 '21 at 17:05
  • Sorry, I understood your expected output wrong. I updated my answer. – ashkangh Mar 30 '21 at 17:17
  • @Anurag Dabas , for big datasets I believe it is! – ashkangh Mar 30 '21 at 17:18
  • Apologies I didnt make myself clear enough. my dataset also have values that are zero so that has to be remained unchanged. The above code will change all the last row of the unique datetime to be 1. – DDM Mar 30 '21 at 17:23
  • I updated my answer. Let me know how it works. – ashkangh Mar 30 '21 at 17:27
1

Just use duplicated() method and stored unique value in a variable:

uniquedf=df[~df.duplicated(subset=['DateTimeStarted'],keep='last')]

Now set 'Value' column of you df equal to 0:

df['Value']=0

Then make use of reindex() method and fillna() method:

result=uniquedf.reindex(df.index).fillna(df)

Finally change the dtype of 'Value' column by astype() method:

result['Value']=result['Value'].astype(int)

Now if you print result you will get your desired output

Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • Hi sorry the apply method is not working out well. What happens is that I have other rows with other datetime in the dataframe and each of the last unique datetime row has been replaced with 1. Is there a workaround for this method? – DDM Mar 30 '21 at 17:16