Imagine I have the following dataframe:
import pandas as pd
mydict = {
'date_time':['2019-01-01','2019-01-02','2019-01-03','2019-01-04'],
'value':[5,3,7,2]
}
df = pd.DataFrame(mydict)
I would like to create the following output:
date_time value Number of previous rows with larger value
0 2019-01-01 5 0
1 2019-01-02 3 1
2 2019-01-03 7 0
3 2019-01-04 2 3
The logic behind the latter column is that it checks all the values of previous columns, and count the number of values that are larger than in the current row.
So in the first row where the value is 0, no previous rows exists, hence the number is 0.
In the second row, the value is 3, and the previous row had a larger value (5), meaning the "number" is
In the 3rd row, the value is 7 and neither row 1 or row 2 are larger so the number is 0.
In the 4th row where the value is 2, row 1,2,3 are larger meaning the number is 3.
What is the most efficient way to accomplish this?