2

Imagine I have the following dataframe:

import pandas as pd
mydict = {
    'date_time':['2019-01-01','2019-01-02','2019-01-03','2019-01-04'],
    'value':[5,3,7,2]
}

df = pd.DataFrame(mydict)

I would like to create the following output:

    date_time  value    Number of previous rows with larger value
0  2019-01-01      5                0
1  2019-01-02      3                1
2  2019-01-03      7                0
3  2019-01-04      2                3

The logic behind the latter column is that it checks all the values of previous columns, and count the number of values that are larger than in the current row.

So in the first row where the value is 0, no previous rows exists, hence the number is 0.

In the second row, the value is 3, and the previous row had a larger value (5), meaning the "number" is

In the 3rd row, the value is 7 and neither row 1 or row 2 are larger so the number is 0.

In the 4th row where the value is 2, row 1,2,3 are larger meaning the number is 3.

What is the most efficient way to accomplish this?

MathiasRa
  • 825
  • 2
  • 12
  • 24
  • 1
    Use jezrael's answer from this post: https://stackoverflow.com/questions/51039857/pandas-count-values-greater-than-current-row-in-the-last-n-rows I would make the window size bigger depending on your data set – oppressionslayer Dec 05 '19 at 05:43
  • 2
    Does this answer your question? [Pandas count values greater than current row in the last n rows](https://stackoverflow.com/questions/51039857/pandas-count-values-greater-than-current-row-in-the-last-n-rows) – Serkan Arslan Dec 05 '19 at 05:47
  • Yes thank you. This is great – MathiasRa Dec 05 '19 at 06:08

1 Answers1

0

We can provide you not the most efficient solution, but it works

import pandas as pd
mydict = {
    'date_time':['2019-01-01','2019-01-02','2019-01-03','2019-01-04','2019-01-04', '2019-01-04'],
    'value':[5,3,7,2,4,7]
}

df = pd.DataFrame(mydict)


df.loc[0:0,'count'] = 0

for i in range(df.shape[0]):
    df.loc[i:i,'count'] = df[:i+1][df['value'].loc[:i] > df['value'].loc[i]].shape[0]

df['count'] = df['count'].astype("int32")
NKam
  • 59
  • 9